dual-task

The Dual-Task Costs of Audiovisual Benefit: Effects of Noise and ‘Native’ Speaker Status

Listeners typically understand speech more accurately when they can see and hear the talker relative to hearing alone. However, seeing the talker’s face does not necessarily reduce the cognitive costs associated with processing speech as measured by dual-task costs. In difficult listening conditions, dual-task response times may be faster for audiovisual than audio-only speech, but when listening conditions are easy, the presence of a talking face may have no effect on dual task responses or even slow responses relative to listening alone. The current study expanded upon this work by including samples of both native and nonnative English speakers and assessing speech intelligibility, subjective listening effort (Experiment 1), and dual-task costs (Experiment 2) for audio-only and audiovisual speech across multiple noise levels. We found that seeing the talker reduces dual-task costs only in difficult listening conditions in which the visual information is necessary to accurately identify the speech. The effects of background noise and speech modality were robust within groups of native as well as nonnative listeners, suggesting that if researchers are interested in studying general phenomena related to speech processing (i.e., rather than specifically studying how language background affects results), these effects would have emerged regardless of whether the sample was limited to native speakers of English. However, the magnitude of some effects differed for native and nonnative listeners.

Impaired Performance in Noise: Disentangling Listening Effort From the Irrelevant Speech Effect

Noise can reduce the intelligibility of spoken language and increase the effort necessary to understand speech. Listening effort, “the deliberate allocation of mental resources to overcome obstacles in goal pursuit when carrying out a [listening] task” (Pichora-Fuller et al., 2016), is commonly assessed by measuring response times to secondary tasks while listening to speech or by testing memory for the content of the speech. Increasing the level of background noise tends to slow responses and impair memory, and these effects are attributed to the resource-intensive process of reevaluating speech that was initially obscured or misheard. However, given that noise can impair performance on cognitive tasks that do not require processing auditory information, it is possible that noise-induced impairments typically ascribed to processing degraded speech may instead reflect increased cognitive load from the presence of noise itself. The current study assessed whether noise, in the absence of a speech task, can affect performance on tasks intended to measure listening effort. In Experiment 1 (positive control), target speech consisting of single words was presented aurally in background noise and we measured listening effort with three commonly-used paradigms. Experiment 2 was identical except that the target words were presented orthographically rather than aurally. Results showed that noise impaired performance on all three tasks when the target stimuli were presented aurally, consistent with a large body of work in the listening effort literature. Experiment 2 revealed that performance on some tasks was impaired by the presence of masking noise (particularly two-talker babble), indicating some domain-general interference. However, the magnitude of the noise-induced interference effects were markedly smaller in Experiment 2 than Experiment 1, suggesting that measures of listening effort capture variability attributable to the challenges associated with listening to speech in noise, and do not simply measure distraction or noise-induced cognitive interference.

Measuring the Dual-Task Costs of Audiovisual Speech Processing Across Levels of Background Noise

Successful communication requires that listeners not only identify speech, but do so while maintaining performance on other tasks, like remembering what a conversational partner said or paying attention while driving. This set of four experiments systematically evaluated how audiovisual speech—which reliably improves speech intelligibility—affects dual-task costs during speech perception (i.e., one facet of listening effort). Results indicated that audiovisual speech reduces dual-task costs in difficult listening conditions (those in which visual cues substantially benefit intelligibility), but may actually increase costs in easy conditions—a pattern of results that was internally replicated multiple times. This study also presents a novel dual-task paradigm specifically designed to facilitate conducting dual-task research remotely. Given the novelty of the task, this study includes psychometric experiments that establish positive and negative control, assess convergent validity, measure task sensitivity relative to a commonly-used dual-task paradigm, and generate performance curves across a range of listening conditions. Thus, in addition to evaluating the effects of audiovisual speech across a wide range of background noise levels, this study enables other researchers to address theoretical questions related to the cognitive mechanisms supporting speech processing beyond the specific issues addressed here and without being limited to in-person research.

Rapid Adaptation to Fully Intelligible Nonnative-Accented Speech Reduces Listening Effort

In noisy settings or when listening to an unfamiliar talker or accent, it can be difficult to understand spoken language. This difficulty typically results in reductions in speech intelligibility, but may also increase the effort necessary to process the speech even when intelligibility is unaffected. In this study, we used a dual-task paradigm and pupillometry to assess the cognitive costs associated with processing fully intelligible accented speech, predicting that rapid perceptual adaptation to an accent would result in decreased listening effort over time. The behavioural and physiological paradigms provided converging evidence that listeners expend greater effort when processing nonnative- relative to native-accented speech, and both experiments also revealed an overall reduction in listening effort over the course of the experiment. Only the pupillometry experiment, however, revealed greater adaptation to nonnative- relative to native-accented speech. An exploratory analysis of the dual-task data that attempted to minimise practice effects revealed weak evidence for greater adaptation to the nonnative accent. These results suggest that even when speech is fully intelligible, resolving deviations between the acoustic input and stored lexical representations incurs a processing cost, and adaptation may attenuate this cost.

Talking Points: A Modulating Circle Increases Listening Effort Without Improving Speech Recognition in Young Adults

Speech recognition is improved when the acoustic input is accompanied by visual cues provided by a talking face (Erber in Journal of Speech and Hearing Research, 12(2), 423–425, 1969; Sumby & Pollack in The Journal of the Acoustical Society of America, 26(2), 212–215, 1954). One way that the visual signal facilitates speech recognition is by providing the listener with information about fine phonetic detail that complements information from the auditory signal. However, given that degraded face stimuli can still improve speech recognition accuracy (Munhall, Kroos, Jozan, & Vatikiotis-Bateson in Perception & Psychophysics, 66(4), 574–583, 2004), and static or moving shapes can improve speech detection accuracy (Bernstein, Auer, & Takayanagi in Speech Communication, 44(1–4), 5–18, 2004), aspects of the visual signal other than fine phonetic detail may also contribute to the perception of speech. In two experiments, we show that a modulating circle providing information about the onset, offset, and acoustic amplitude envelope of the speech does not improve recognition of spoken sentences (Experiment 1) or words (Experiment 2). Further, contrary to our hypothesis, the modulating circle increased listening effort despite subjective reports that it made the word recognition task seem easier to complete (Experiment 2). These results suggest that audiovisual speech processing, even when the visual stimulus only conveys temporal information about the acoustic signal, may be a cognitively demanding process.

About Face: Seeing the Talker Improves Spoken Word Recognition but Increases Listening Effort

It is widely accepted that seeing a talker improves a listener’s ability to understand what a talker is saying in background noise (e.g., Erber, 1969; Sumby & Pollack, 1954). The literature is mixed, however, regarding the influence of the visual modality on the listening effort required to recognize speech (e.g., Fraser, Gagné, Alepins, & Dubois, 2010; Sommers & Phelps, 2016). Here, we present data showing that even when the visual modality robustly benefits recognition, processing audiovisual speech can still result in greater cognitive load than processing speech in the auditory modality alone. We show using a dual-task paradigm that the costs associated with audiovisual speech processing are more pronounced in easy listening conditions, in which speech can be recognized at high rates in the auditory modality alone—indeed, effort did not differ between audiovisual and audio-only conditions when the background noise was presented at a more difficult level. Further, we show that though these effects replicate with different stimuli and participants, they do not emerge when effort is assessed with a recall paradigm rather than a dual-task paradigm. Together, these results suggest that the widely cited audiovisual recognition benefit may come at a cost under more favorable listening conditions, and add to the growing body of research suggesting that various measures of effort may not be tapping into the same underlying construct (Strand et al., 2018).

“Paying” Attention to Audiovisual Speech: Do Incongruent Stimuli Incur Greater Costs?

The McGurk effect is a multisensory phenomenon in which discrepant auditory and visual speech signals typically result in an illusory percept. McGurk stimuli are often used in studies assessing the attentional requirements of audiovisual integration, but no study has directly compared the costs associated with integrating congruent versus incongruent audiovisual speech. Some evidence suggests that the McGurk effect may not be representative of naturalistic audiovisual speech processing – susceptibility to the McGurk effect is not associated with the ability to derive benefit from the addition of the visual signal, and distinct cortical regions are recruited when processing congruent versus incongruent speech. In two experiments, one using response times to identify congruent and incongruent syllables and one using a dual-task paradigm, we assessed whether congruent and incongruent audiovisual speech incur different attentional costs. We demonstrated that response times to both the speech task (Experiment 1) and a secondary vibrotactile task (Experiment 2) were indistinguishable for congruent compared to incongruent syllables, but McGurk fusions were responded to more quickly than McGurk non-fusions. These results suggest that despite documented differences in how congruent and incongruent stimuli are processed, they do not appear to differ in terms of processing time or effort, at least in the open-set task speech task used here. However, responses that result in McGurk fusions are processed more quickly than those that result in non-fusions, though attentional cost is comparable for the two response types.

Noise Increases Listening Effort in Normal-Hearing Young Adults, Regardless of Working Memory Capacity

As listening conditions worsen (e.g. background noise increases), additional cognitive effort is required to process speech. The existing literature is mixed on whether and how cognitive traits like working memory capacity moderate the amount of effort that listeners must expend to successfully understand speech. Here, we validate a dual-task measure of listening effort (Experiment 1) and demonstrate that for normal-hearing young adults, effort increases as steady-state masking noise increases, but working memory capacity is unrelated to the amount of effort expended (Experiment 2). We propose that previous research may have overestimated the relationship between listening effort and working memory capacity by measuring listening effort using recall-based tasks. The present results suggest caution in making the general assumption that working memory capacity is related to the amount of effort expended during a listening task.