speech perception

The Effects of Temporal Cues, Point-Light Displays, and Faces on Speech Identification and Listening Effort

Among the most robust findings in speech research is that the presence of a talking face improves the intelligibility of spoken language. Talking faces supplement the auditory signal by providing fine phonetic cues based on the placement of the articulators, as well as temporal cues to when speech is occurring. In this study, we varied the amount of information contained in the visual signal, ranging from temporal information alone to a natural talking face. Participants were presented with spoken sentences in energetic or informational masking in four different visual conditions: audio-only, a modulating circle providing temporal cues to salient features of the speech, a digitally rendered point-light display showing lip movement, and a natural talking face. We assessed both sentence identification accuracy and self-reported listening effort. Audiovisual benefit for intelligibility was observed for the natural face in both informational and energetic masking, but the digitally rendered point-light display only provided benefit in energetic masking. Intelligibility for speech accompanied by the modulating circle did not differ from the audio-only conditions in either masker type. Thus, the temporal cues used here were insufficient to improve speech intelligibility in noise, but some types of digital point-light displays may contain enough phonetic detail to produce modest improvements in speech identification in noise.

Spread the Word: Enhancing Replicability of Speech Research Through Stimulus Sharing

Purpose: The ongoing replication crisis within and beyond psychology has revealed the numerous ways in which flexibility in the research process can affect study outcomes. In speech research, examples of these “researcher degrees of freedom” include the particular syllables, words, or sentences presented; the talkers who produce the stimuli and the instructions given to them; the population tested; whether and how stimuli are matched on amplitude; the type of masking noise used and its presentation level; and many others. In this research note, we argue that even seemingly minor methodological choices have the potential to affect study outcomes. To that end, we present a reanalysis of six existing data sets on spoken word identification in noise to assess how differences in talkers, stimulus processing, masking type, and listeners affect identification accuracy. Conclusions: Our reanalysis revealed relatively low correlations among word identification rates across studies. The data suggest that some of the seemingly innocuous methodological details that differ across studies—details that cannot possibly be reported in text given the idiosyncrasies inherent to speech—introduce unknown variability that may affect replicability of our findings. We therefore argue that publicly sharing stimuli is a crucial step toward improved replicability in speech research.

Spread the Word: Enhancing Replicability of Speech Research Through Stimulus Sharing

Purpose: The ongoing replication crisis within and beyond psychology has revealed the numerous ways in which flexibility in the research process can affect study outcomes. In speech research, examples of these “researcher degrees of freedom” include the particular syllables, words, or sentences presented; the talkers who produce the stimuli and the instructions given to them; the population tested; whether and how stimuli are matched on amplitude; the type of masking noise used and its presentation level; and many others. In this research note, we argue that even seemingly minor methodological choices have the potential to affect study outcomes. To that end, we present a reanalysis of six existing data sets on spoken word identification in noise to assess how differences in talkers, stimulus processing, masking type, and listeners affect identification accuracy. Conclusions: Our reanalysis revealed relatively low correlations among word identification rates across studies. The data suggest that some of the seemingly innocuous methodological details that differ across studies—details that cannot possibly be reported in text given the idiosyncrasies inherent to speech—introduce unknown variability that may affect replicability of our findings. We therefore argue that publicly sharing stimuli is a crucial step toward improved replicability in speech research.

Preregistration: Practical Considerations for Speech, Language, and Hearing Research

In the last decade, psychology and other sciences have implemented numerous reforms to improve the robustness of our research, many of which are based on increasing transparency throughout the research process. Among these reforms is the practice of preregistration, in which researchers create a time- stamped and uneditable document before data collection that describes the methods of the study, how the data will be analyzed, the sample size, and many other decisions. The current article highlights the benefits of preregistration with a focus on the specific issues that speech, language, and hearing researchers are likely to encounter, and additionally provides a tutorial for writing preregistrations. Conclusions: Although rates of preregistration have increased dramatically in recent years, the practice is still relatively uncommon in research on speech, language, and hearing. Low rates of adoption may be driven by a lack of under- standing of the benefits of preregistration (either generally or for our discipline in particular) or uncertainty about how to proceed if it becomes necessary to deviate from the preregistered plan. Alternatively, researchers may see the ben- efits of preregistration but not know where to start, and gathering this informa- tion from a wide variety of sources is arduous and time consuming. This tutorial addresses each of these potential roadblocks to preregistration and equips readers with tools to facilitate writing preregistrations for research on speech, language, and hearing.

Preregistration: Practical Considerations for Speech, Language, and Hearing Research

In the last decade, psychology and other sciences have implemented numerous reforms to improve the robustness of our research, many of which are based on increasing transparency throughout the research process. Among these reforms is the practice of preregistration, in which researchers create a time- stamped and uneditable document before data collection that describes the methods of the study, how the data will be analyzed, the sample size, and many other decisions. The current article highlights the benefits of preregistration with a focus on the specific issues that speech, language, and hearing researchers are likely to encounter, and additionally provides a tutorial for writing preregistrations. Conclusions: Although rates of preregistration have increased dramatically in recent years, the practice is still relatively uncommon in research on speech, language, and hearing. Low rates of adoption may be driven by a lack of under- standing of the benefits of preregistration (either generally or for our discipline in particular) or uncertainty about how to proceed if it becomes necessary to deviate from the preregistered plan. Alternatively, researchers may see the ben- efits of preregistration but not know where to start, and gathering this informa- tion from a wide variety of sources is arduous and time consuming. This tutorial addresses each of these potential roadblocks to preregistration and equips readers with tools to facilitate writing preregistrations for research on speech, language, and hearing.

Speech and Non-Speech Measures of Audiovisual Integration are not Correlated

Many natural events generate both visual and auditory signals, and humans are remarkably adept at integrating information from those sources. However, individuals appear to differ markedly in their ability or propensity to combine what they hear with what they see. Individual differences in audiovisual integration have been established using a range of materials, including speech stimuli (seeing and hearing a talker) and simpler audiovisual stimuli (seeing flashes of light combined with tones). Although there are multiple tasks in the literature that are referred to as “measures of audiovisual integration,” the tasks themselves differ widely with respect to both the type of stimuli used (speech versus non-speech) and the nature of the tasks themselves (e.g., some tasks use conflicting auditory and visual stimuli whereas others use congruent stimuli). It is not clear whether these varied tasks are actually measuring the same underlying construct: audiovisual integration. This study tested the relationships among four commonly-used measures of audiovisual integration, two of which use speech stimuli (susceptibility to the McGurk effect and a measure of audiovisual benefit), and two of which use non-speech stimuli (the sound-induced flash illusion and audiovisual integration capacity). We replicated previous work showing large individual differences in each measure but found no significant correlations among any of the measures. These results suggest that tasks that are commonly referred to as measures of audiovisual integration may be tapping into different parts of the same process or different constructs entirely.

Revisiting the Target-Masker Linguistic Similarity Hypothesis

The linguistic similarity hypothesis states that it is more difficult to segregate target and masker speech when they are linguistically similar. For example, recognition of English target speech should be more impaired by the presence of Dutch masking speech than Mandarin masking speech because Dutch and English are more linguistically similar than Mandarin and English. Across four experiments, English target speech was consistently recognized more poorly when presented in English masking speech than in silence, speech-shaped noise, or an unintelligible masker (i.e., Dutch or Mandarin). However, we found no evidence for graded masking effects—Dutch did not impair performance more than Mandarin in any experiment, despite 650 participants being tested. This general pattern was consistent when using both a cross-modal paradigm (in which target speech was lipread and maskers were presented aurally; Experiments 1a and 1b) and an auditory-only paradigm (in which both the targets and maskers were presented aurally; Experiments 2a and 2b). These findings suggest that the linguistic similarity hypothesis should be refined to reflect the existing evidence: There is greater release from masking when the masker language differs from the target speech than when it is the same as the target speech. However, evidence that unintelligible maskers impair speech identification to a greater extent when they are more linguistically similar to the target language remains elusive.

Revisiting the Relationship Between Implicit Racial Bias and Audiovisual Benefit for Nonnative-Accented Speech

Speech intelligibility is improved when the listener can see the talker in addition to hearing their voice. Notably, though, previous work has suggested that this “audiovisual benefit” for nonnative (i.e., foreign-accented) speech is smaller than the benefit for native speech, an effect that may be partially accounted for by listeners’ implicit racial biases (Yi et al., 2013, The Journal of the Acoustical Society of America, 134[5], EL387–EL393.). In the present study, we sought to replicate these find- ings in a significantly larger sample of online participants. In a direct replication of Yi et al. (Experiment 1), we found that audiovisual benefit was indeed smaller for nonnative-accented relative to native-accented speech. However, our results did not support the conclusion that implicit racial biases, as measured with two types of implicit association tasks, were related to these differences in audiovisual benefit for native and nonnative speech. In a second experiment, we addressed a potential confound in the experimental design; to ensure that the difference in audiovisual benefit was caused by a difference in accent rather than a difference in overall intelligibility, we reversed the overall difficulty of each accent condition by presenting them at different signal-to-noise ratios. Even when native speech was presented at a much more difficult intelligibility level than nonnative speech, audiovisual benefit for nonnative speech remained poorer. In light of these findings, we discuss alternative explanations of reduced audiovisual benefit for nonnative speech, as well as methodological considerations for future work examining the intersection of social, cognitive, and linguistic processes.

Face Mask Type Affects Audiovisual Speech Intelligibility and Subjective Listening Effort in Young and Older Adults

Identifying speech requires that listeners make rapid use of fine-grained acoustic cues—a process that is facilitated by being able to see the talker’s face. Face masks present a challenge to this process because they can both alter acoustic information and conceal the talker’s mouth. Here, we investigated the degree to which different types of face masks and noise levels affect speech intelligibility and subjective listening effort for young (N = 180) and older (N = 180) adult listeners. We found that in quiet, mask type had little influence on speech intelligibility relative to speech produced without a mask for both young and older adults. However, with the addition of moderate (− 5 dB SNR) and high (− 9 dB SNR) levels of background noise, intelligibility dropped substantially for all types of face masks in both age groups. Across noise levels, transparent face masks and cloth face masks with filters impaired performance the most, and surgical face masks had the smallest influence on intelligibility. Participants also rated speech produced with a face mask as more effortful than unmasked speech, particularly in background noise. Although young and older adults were similarly affected by face masks and noise in terms of intelligibility and subjective listening effort, older adults showed poorer intelligibility overall and rated the speech as more effortful to process relative to young adults. This research will help individuals make more informed decisions about which types of masks to wear in various communicative settings.

“Where Are the . . .Fixations?:” Grammatical Number Cues Guide Anticipatory Fixations to Upcoming Referents and Reduce Lexical Competition

Listeners make use of contextual cues during continuous speech processing that help overcome the limitations of the acoustic input. These semantic, grammatical, and pragmatic cues facilitate prediction of upcoming words and/or reduce the lexical search space by inhibiting activation of contextually inappropriate words that share phonological information with the target. The current study used the visual world paradigm to assess whether and how listeners use contextual cues about grammatical number during sentence processing by presenting target words in carrier phrases that were grammatically unconstraining (“Click on the . . .”) or grammatically constraining (“Where is/are the . . .”). Prior to the onset of the target word, listeners were already more likely to fixate on plural objects in the “Where are the . . .” context than the “Where is the . . .” context, indicating that they used the construction of the verb to anticipate the referent. Further, participants showed less interference from cohort competitors when the sentence frame made them contextually inappropriate, but still fixated on those words more than on phonologically unrelated distractor words. These results suggest that listeners rapidly and flexibly make use of contextual cues about grammatical number while maintaining sensitivity to the bottom-up input.

An Introduction to Linear Mixed-Effects Modeling in R

This Tutorial serves as both an approachable theoretical introduction to mixed-effects modeling and a practical introduction to how to implement mixed-effects models in R. The intended audience is researchers who have some basic statistical knowledge, but little or no experience implementing mixed-effects models in R using their own data. In an attempt to increase the accessibility of this Tutorial, I deliberately avoid using mathematical terminology beyond what a student would learn in a standard graduate-level statistics course, but I reference articles and textbooks that provide more detail for interested readers. This Tutorial includes snippets of R code throughout; the data and R script used to build the models described in the text are available via OSF at https://osf.io/v6qag/, so readers can follow along if they wish. The goal of this practical introduction is to provide researchers with the tools they need to begin implementing mixed-effects models in their own research.

An Introduction to Linear Mixed-Effects Modeling in R

This Tutorial serves as both an approachable theoretical introduction to mixed-effects modeling and a practical introduction to how to implement mixed-effects models in R. The intended audience is researchers who have some basic statistical knowledge, but little or no experience implementing mixed-effects models in R using their own data. In an attempt to increase the accessibility of this Tutorial, I deliberately avoid using mathematical terminology beyond what a student would learn in a standard graduate-level statistics course, but I reference articles and textbooks that provide more detail for interested readers. This Tutorial includes snippets of R code throughout; the data and R script used to build the models described in the text are available via OSF at https://osf.io/v6qag/, so readers can follow along if they wish. The goal of this practical introduction is to provide researchers with the tools they need to begin implementing mixed-effects models in their own research.

Understanding Speech Amid the Jingle and Jangle: Recommendations for Improving Measurement Practices in Listening Effort Research

The latent constructs psychologists study are typically not directly accessible, so researchers must design measurement instruments that are intended to provide insights about those constructs. Construct validation—assessing whether instruments measure what they intend to—is therefore critical for ensuring that the conclusions we draw actually reflect the intended phenomena. Insufficient construct validation can lead to the jingle fallacy—falsely assuming two instruments measure the same construct because the instruments share a name—and the jangle fallacy—falsely assuming two instruments measure different constructs because the instruments have different names. In this paper, we examine construct validation practices in research on listening effort and identify patterns that strongly suggest the presence of jingle and jangle in the literature. We argue that the lack of construct validation for listening effort measures has led to inconsistent findings and hindered our understanding of the construct. We also provide specific recommendations for improving construct validation of listening effort instruments, drawing on the framework laid out in a recent paper on improving measurement practices. Although this paper addresses listening effort, the issues raised and recommendations presented are widely applicable to tasks used in research on auditory perception and cognitive psychology.

Understanding Speech Amid the Jingle and Jangle: Recommendations for Improving Measurement Practices in Listening Effort Research

The latent constructs psychologists study are typically not directly accessible, so researchers must design measurement instruments that are intended to provide insights about those constructs. Construct validation—assessing whether instruments measure what they intend to—is therefore critical for ensuring that the conclusions we draw actually reflect the intended phenomena. Insufficient construct validation can lead to the jingle fallacy—falsely assuming two instruments measure the same construct because the instruments share a name—and the jangle fallacy—falsely assuming two instruments measure different constructs because the instruments have different names. In this paper, we examine construct validation practices in research on listening effort and identify patterns that strongly suggest the presence of jingle and jangle in the literature. We argue that the lack of construct validation for listening effort measures has led to inconsistent findings and hindered our understanding of the construct. We also provide specific recommendations for improving construct validation of listening effort instruments, drawing on the framework laid out in a recent paper on improving measurement practices. Although this paper addresses listening effort, the issues raised and recommendations presented are widely applicable to tasks used in research on auditory perception and cognitive psychology.

Recall of Speech is Impaired by Subsequent Masking Noise: A Replication of Rabbitt (1968) Experiment 2

The presence of masking noise can impair speech intelligibility and increase the cognitive resources necessary to understand speech. The first study to demonstrate the negative cognitive consequences of noisy speech—published by Rabbitt in 1968—found that participants had poorer recall for aurally presented digits early in a list when later digits were presented in noise relative to quiet. However, despite being cited nearly 500 times and providing the foundation for a wealth of subsequent research on the topic, the original study has never been directly replicated. Here we report a replication attempt of that study with a large online sample and tested the robustness of the results to a variety of scoring and analytical techniques. We replicated the key finding that listening to speech in noise impairs recall for items that came earlier in the list. The results were consistent when we used the original analytical technique (an ANOVA) and a more powerful analytical technique (generalized linear mixed effects models) that was not available when the original paper was published. These findings support the claim that effortful listening can interfere with encoding or rehearsal of previously presented information.

Rapid Adaptation to Fully Intelligible Nonnative-Accented Speech Reduces Listening Effort

In noisy settings or when listening to an unfamiliar talker or accent, it can be difficult to understand spoken language. This difficulty typically results in reductions in speech intelligibility, but may also increase the effort necessary to process the speech even when intelligibility is unaffected. In this study, we used a dual-task paradigm and pupillometry to assess the cognitive costs associated with processing fully intelligible accented speech, predicting that rapid perceptual adaptation to an accent would result in decreased listening effort over time. The behavioural and physiological paradigms provided converging evidence that listeners expend greater effort when processing nonnative- relative to native-accented speech, and both experiments also revealed an overall reduction in listening effort over the course of the experiment. Only the pupillometry experiment, however, revealed greater adaptation to nonnative- relative to native-accented speech. An exploratory analysis of the dual-task data that attempted to minimise practice effects revealed weak evidence for greater adaptation to the nonnative accent. These results suggest that even when speech is fully intelligible, resolving deviations between the acoustic input and stored lexical representations incurs a processing cost, and adaptation may attenuate this cost.

Talking Points: A Modulating Circle Increases Listening Effort Without Improving Speech Recognition in Young Adults

Speech recognition is improved when the acoustic input is accompanied by visual cues provided by a talking face (Erber in Journal of Speech and Hearing Research, 12(2), 423–425, 1969; Sumby & Pollack in The Journal of the Acoustical Society of America, 26(2), 212–215, 1954). One way that the visual signal facilitates speech recognition is by providing the listener with information about fine phonetic detail that complements information from the auditory signal. However, given that degraded face stimuli can still improve speech recognition accuracy (Munhall, Kroos, Jozan, & Vatikiotis-Bateson in Perception & Psychophysics, 66(4), 574–583, 2004), and static or moving shapes can improve speech detection accuracy (Bernstein, Auer, & Takayanagi in Speech Communication, 44(1–4), 5–18, 2004), aspects of the visual signal other than fine phonetic detail may also contribute to the perception of speech. In two experiments, we show that a modulating circle providing information about the onset, offset, and acoustic amplitude envelope of the speech does not improve recognition of spoken sentences (Experiment 1) or words (Experiment 2). Further, contrary to our hypothesis, the modulating circle increased listening effort despite subjective reports that it made the word recognition task seem easier to complete (Experiment 2). These results suggest that audiovisual speech processing, even when the visual stimulus only conveys temporal information about the acoustic signal, may be a cognitively demanding process.

About Face: Seeing the Talker Improves Spoken Word Recognition but Increases Listening Effort

It is widely accepted that seeing a talker improves a listener’s ability to understand what a talker is saying in background noise (e.g., Erber, 1969; Sumby & Pollack, 1954). The literature is mixed, however, regarding the influence of the visual modality on the listening effort required to recognize speech (e.g., Fraser, Gagné, Alepins, & Dubois, 2010; Sommers & Phelps, 2016). Here, we present data showing that even when the visual modality robustly benefits recognition, processing audiovisual speech can still result in greater cognitive load than processing speech in the auditory modality alone. We show using a dual-task paradigm that the costs associated with audiovisual speech processing are more pronounced in easy listening conditions, in which speech can be recognized at high rates in the auditory modality alone—indeed, effort did not differ between audiovisual and audio-only conditions when the background noise was presented at a more difficult level. Further, we show that though these effects replicate with different stimuli and participants, they do not emerge when effort is assessed with a recall paradigm rather than a dual-task paradigm. Together, these results suggest that the widely cited audiovisual recognition benefit may come at a cost under more favorable listening conditions, and add to the growing body of research suggesting that various measures of effort may not be tapping into the same underlying construct (Strand et al., 2018).

“Paying” Attention to Audiovisual Speech: Do Incongruent Stimuli Incur Greater Costs?

The McGurk effect is a multisensory phenomenon in which discrepant auditory and visual speech signals typically result in an illusory percept. McGurk stimuli are often used in studies assessing the attentional requirements of audiovisual integration, but no study has directly compared the costs associated with integrating congruent versus incongruent audiovisual speech. Some evidence suggests that the McGurk effect may not be representative of naturalistic audiovisual speech processing – susceptibility to the McGurk effect is not associated with the ability to derive benefit from the addition of the visual signal, and distinct cortical regions are recruited when processing congruent versus incongruent speech. In two experiments, one using response times to identify congruent and incongruent syllables and one using a dual-task paradigm, we assessed whether congruent and incongruent audiovisual speech incur different attentional costs. We demonstrated that response times to both the speech task (Experiment 1) and a secondary vibrotactile task (Experiment 2) were indistinguishable for congruent compared to incongruent syllables, but McGurk fusions were responded to more quickly than McGurk non-fusions. These results suggest that despite documented differences in how congruent and incongruent stimuli are processed, they do not appear to differ in terms of processing time or effort, at least in the open-set task speech task used here. However, responses that result in McGurk fusions are processed more quickly than those that result in non-fusions, though attentional cost is comparable for the two response types.

Noise Increases Listening Effort in Normal-Hearing Young Adults, Regardless of Working Memory Capacity

As listening conditions worsen (e.g. background noise increases), additional cognitive effort is required to process speech. The existing literature is mixed on whether and how cognitive traits like working memory capacity moderate the amount of effort that listeners must expend to successfully understand speech. Here, we validate a dual-task measure of listening effort (Experiment 1) and demonstrate that for normal-hearing young adults, effort increases as steady-state masking noise increases, but working memory capacity is unrelated to the amount of effort expended (Experiment 2). We propose that previous research may have overestimated the relationship between listening effort and working memory capacity by measuring listening effort using recall-based tasks. The present results suggest caution in making the general assumption that working memory capacity is related to the amount of effort expended during a listening task.

What Accounts for Individual Differences in Susceptibility to the McGurk Effect?

The McGurk effect is a classic audiovisual speech illusion in which discrepant auditory and visual syllables can lead to a fused percept (e.g., an auditory /bɑ/ paired with a visual /gɑ/ often leads to the perception of /dɑ/). The McGurk effect is robust and easily replicated in pooled group data, but there is tremendous variability in the extent to which individual participants are susceptible to it. In some studies, the rate at which individuals report fusion responses ranges from 0% to 100%. Despite its widespread use in the audiovisual speech perception literature, the roots of the wide variability in McGurk susceptibility are largely unknown. This study evaluated whether several perceptual and cognitive traits are related to McGurk susceptibility through correlational analyses and mixed effects modeling. We found that an individual’s susceptibility to the McGurk effect was related to their ability to extract place of articulation information from the visual signal (i.e., a more fine-grained anal- ysis of lipreading ability), but not to scores on tasks measuring attentional control, processing speed, working memory capacity, or auditory perceptual gradiency. These results provide support for the claim that a small amount of the variability in susceptibility to the McGurk effect is attributable to lipreading skill. In contrast, cognitive and perceptual abilities that are commonly used predictors in individual differences studies do not appear to underlie susceptibility to the McGurk effect.

Measuring Listening Effort: Convergent Validity, Sensitivity, and Links With Cognitive and Personality Measures

Purpose: Listening effort (LE) describes the attentional or cognitive requirements for successful listening. Despite substantial theoretical and clinical interest in LE, inconsistent operationalization makes it difficult to make generalizations across studies. The aims of this large-scale validation study were to evaluate the convergent validity and sensitivity of commonly used measures of LE and assess how scores on those tasks relate to cognitive and personality variables. Method: Young adults with normal hearing (N = 111) completed 7 tasks designed to measure LE, 5 tests of cognitive ability, and 2 personality measures. Results: Scores on some behavioral LE tasks were moderately intercorrelated but were generally not correlated with subjective and physiological measures of LE, suggesting that these tasks may not be tapping into the same underlying construct. LE measures differed in their sensitivity to changes in signal-to-noise ratio and the extent to which they correlated with cognitive and personality variables. Conclusions: Given that LE measures do not show consistent, strong intercorrelations and differ in their relationships with cognitive and personality predictors, these findings suggest caution in generalizing across studies that use different measures of LE. The results also indicate that people with greater cognitive ability appear to use their resources more efficiently, thereby diminishing the detrimental effects associated with increased background noise during language processing.

Keep Listening: Grammatical Context Reduces But Does Not Eliminate Activation of Unexpected Words

To understand spoken language, listeners combine acoustic-phonetic input with expectations derived from context (Dahan & Magnuson, 2006). Eye-tracking studies on semantic context have demonstrated that the activation levels of competing lexical candidates depend on the relative strengths of the bottom-up input and top-down expectations (cf. Dahan & Tanenhaus, 2004). In the grammatical realm, however, graded effects of context on lexical competition have been predicted (Magnuson, Tanenhaus, & Aslin, 2008), but not demonstrated. In the current eye-tracking study, participants were presented with target words in grammatically unconstraining (e.g., “The word is . . . ”) or constraining (e.g., “They thought about the . . .”) contexts. In the grammatically constrained, identity-spliced trials, in which phonetic information from one token of the target was spliced into another token of the target, fixations to the competitor did not differ from those to distractors. However, in the grammatically constrained, cross-spliced trials, in which phonetic information from the competitor was cross-spliced into the target to increase bottom-up support for that competitor, participants fixated more on contextually inappropriate competitors than phonologically unrelated distractors, demonstrating that sufficiently strong acoustic-phonetic input can overcome contextual constraints. Thus, although grammatical context constrains lexical activation, listeners remain sensitive to the bottom-up input. Taken together, these results suggest that lexical activation is dependent upon the interplay of acoustic-phonetic input and top-down expectations derived from grammatical context.