You have 3 free guides left 😟
Unlock your guides
You have 3 free guides left 😟
Unlock your guides

Speech perception is a complex process that allows us to understand spoken language. It involves detecting, discriminating, and recognizing speech sounds, words, and sentences in various contexts. This crucial aspect of human communication requires integrating acoustic cues, phonetic knowledge, and contextual information.

The process occurs in stages, from initial auditory analysis to higher-level linguistic processing. Listeners must handle variability in speech signals due to factors like speaker characteristics and acoustic environments. Several theories explain speech perception, emphasizing different aspects of the process and how listeners extract meaning from acoustic input.

Speech perception basics

  • Speech perception involves the process of interpreting and understanding spoken language, which is a crucial aspect of human communication and cognition
  • It encompasses the ability to detect, discriminate, and recognize speech sounds, words, and sentences in various contexts and environments

Defining speech perception

Top images from around the web for Defining speech perception
Top images from around the web for Defining speech perception
  • Speech perception refers to the process by which listeners extract linguistic information from the acoustic signal of speech
  • Involves the transformation of continuous acoustic waveforms into discrete linguistic units such as phonemes, syllables, and words
  • Requires the integration of multiple sources of information, including acoustic cues, phonetic knowledge, and contextual information

Stages of speech processing

  • Speech processing occurs in several stages, from the initial auditory analysis to higher-level linguistic processing
  • Auditory stage: involves the transduction of acoustic signals into neural representations in the auditory system
  • Phonetic stage: involves the mapping of acoustic cues onto phonetic categories and the identification of speech sounds
  • Lexical stage: involves the recognition of words and the activation of their meanings in the mental lexicon
  • Syntactic and semantic stages: involve the integration of words into larger linguistic structures and the interpretation of sentence meaning

Variability in speech signals

  • Speech signals exhibit considerable variability due to factors such as speaker characteristics, speaking rate, and acoustic environment
  • Variability in speech production arises from differences in vocal tract anatomy, dialect, and speaking style across individuals
  • Listeners must be able to handle this variability and extract invariant linguistic information from the variable acoustic signal
  • : the ability to perceive speech sounds as the same despite variations in the acoustic signal (e.g., recognizing the same produced by different speakers)

Theories of speech perception

  • Several theories have been proposed to explain how listeners perceive and process speech signals, each emphasizing different aspects of the speech perception process
  • These theories aim to account for the complex interactions between acoustic, phonetic, and linguistic information in speech perception

Motor theory

  • Proposed by and colleagues at Haskins Laboratories in the 1950s
  • Suggests that speech perception is mediated by the listener's knowledge of speech production
  • Assumes that listeners perceive speech by simulating the articulatory gestures used to produce speech sounds
  • Emphasizes the role of the motor system in speech perception and the close link between speech production and perception

Acoustic theory

  • Focuses on the acoustic properties of speech signals as the primary source of information for speech perception
  • Assumes that listeners extract acoustic cues (e.g., formant frequencies, duration, and amplitude) from the speech signal to identify phonemes and words
  • Does not rely on knowledge of speech production or articulatory gestures
  • Emphasizes the importance of the auditory system in processing and analyzing acoustic information

Analysis-by-synthesis model

  • Combines elements of both motor and acoustic theories
  • Proposes that listeners use their knowledge of speech production to generate internal hypotheses about the intended message
  • These hypotheses are then compared with the incoming acoustic signal to determine the best match
  • Involves a feedback loop between perception and production, where the listener actively tests and refines their hypotheses based on the acoustic input

Trace model

  • A connectionist model of speech perception developed by James McClelland and Jeffrey Elman
  • Assumes that speech perception involves the activation of a network of interconnected nodes representing phonetic features, phonemes, and words
  • Information flows bidirectionally through the network, allowing for top-down influences of lexical knowledge on phoneme perception
  • Accounts for various phenomena in speech perception, such as the influence of context on phoneme identification and the restoration of missing or ambiguous speech sounds

Speech segmentation

  • Speech is a continuous stream of sounds without clear boundaries between words or phonemes, posing a challenge for listeners to segment the speech signal into meaningful units
  • is the process by which listeners divide the continuous speech stream into discrete linguistic units, such as words and phrases

Segmenting speech stream

  • Listeners use various cues to segment the speech stream, including acoustic, phonetic, and prosodic information
  • Acoustic cues: changes in amplitude, spectral composition, and duration can signal word boundaries (e.g., longer durations and pauses at word boundaries)
  • Phonetic cues: certain phoneme sequences are more likely to occur within words than across word boundaries (e.g., /st/ is more likely to occur word-internally than across word boundaries in English)
  • Lexical cues: the recognition of familiar words can help listeners identify word boundaries in connected speech

Role of prosodic cues

  • Prosodic cues, such as stress, rhythm, and intonation, play a crucial role in speech segmentation
  • : in stress-timed languages like English, stressed syllables are more likely to occur at the beginning of words, providing a cue for word boundaries
  • Rhythmic properties: the alternation of strong and weak syllables creates a rhythmic structure that can aid in segmenting the speech stream
  • Intonational phrases: the grouping of words into intonational phrases, marked by changes in pitch contour and pauses, can help listeners identify larger linguistic units

Statistical learning in segmentation

  • Listeners can use statistical regularities in the speech input to identify word boundaries and segment the speech stream
  • : the probability of one speech sound following another is higher within words than across word boundaries
  • Infants and adults are sensitive to these statistical regularities and can use them to segment speech, even in the absence of other cues
  • Statistical learning is an implicit process that occurs through exposure to the language and does not require explicit instruction or feedback

Phoneme perception

  • Phonemes are the smallest units of sound that distinguish meaning in a language
  • Phoneme perception involves the ability to detect, discriminate, and categorize speech sounds based on their distinctive features

Categorical perception of phonemes

  • Listeners tend to perceive speech sounds categorically, meaning that they are more sensitive to differences between phoneme categories than within categories
  • is demonstrated by the abrupt change in identification and discrimination performance across phoneme boundaries
  • Suggests that listeners map the continuous acoustic signal onto discrete phoneme categories, rather than processing speech sounds in a purely continuous manner

Phoneme discrimination

  • The ability to distinguish between different phonemes is essential for accurate speech perception
  • Listeners are highly sensitive to the acoustic differences that signal phoneme contrasts, such as voice onset time (VOT) for stop consonants and formant frequencies for vowels
  • Discrimination performance is typically better across phoneme boundaries than within categories, reflecting the categorical nature of phoneme perception

Phoneme restoration effect

  • The demonstrates the role of top-down processes in phoneme perception
  • When a portion of a speech sound is replaced by noise or silence, listeners often report hearing the missing sound as if it were present
  • The restored phoneme is typically consistent with the linguistic context and the listener's expectations
  • Suggests that listeners actively use their linguistic knowledge to fill in missing or ambiguous information in the speech signal

Context effects on phoneme perception

  • The perception of phonemes is influenced by the surrounding linguistic context, including adjacent sounds, words, and sentences
  • : the articulation of one speech sound is influenced by the production of neighboring sounds, leading to context-dependent acoustic cues
  • Phonological context: the interpretation of a speech sound can be affected by the phonological rules and constraints of the language (e.g., the realization of /t/ as a flap in certain contexts in American English)
  • Lexical context: the identification of a phoneme can be biased by the listener's knowledge of words and their frequencies in the language (e.g., the "Ganong effect", where ambiguous sounds are more likely to be perceived as forming a word than a non-word)

Word recognition

  • Word recognition is the process by which listeners map the acoustic-phonetic input onto lexical representations stored in their mental lexicon
  • It involves the activation and selection of word candidates based on the incoming speech signal and the listener's linguistic knowledge

Lexical access and selection

  • Lexical access refers to the process of activating word candidates in the mental lexicon based on the acoustic-phonetic input
  • Multiple word candidates that match the input are initially activated in parallel, creating a set of potential words
  • Lexical selection involves the process of narrowing down the activated candidates to identify the intended word
  • Selection is influenced by factors such as the degree of acoustic-phonetic match, word frequency, and contextual information

Cohort model of word recognition

  • Proposed by William Marslen-Wilson and colleagues
  • Assumes that word recognition occurs incrementally, with the activation of word candidates that match the initial portion of the speech input (the "cohort")
  • As more acoustic-phonetic information becomes available, the cohort is progressively narrowed down until a single word is selected
  • Emphasizes the importance of the initial portion of the word in constraining lexical access and the role of top-down contextual information in guiding selection

Neighborhood activation model

  • Developed by Paul Luce and colleagues
  • Proposes that word recognition is influenced by the activation of phonologically similar words in the mental lexicon (the "neighborhood")
  • Words with many similar-sounding neighbors are more difficult to recognize than words with few neighbors, due to increased competition among activated candidates
  • Accounts for the effects of neighborhood density and frequency on word recognition performance

Frequency and familiarity effects

  • Word frequency: high-frequency words are recognized more quickly and accurately than low-frequency words, reflecting their stronger representations in the mental lexicon
  • Familiarity: words that are more familiar to the listener (e.g., through personal experience or cultural exposure) are easier to recognize than less familiar words
  • Age of acquisition: words learned earlier in life are typically recognized more efficiently than words learned later, even when controlling for frequency
  • These effects demonstrate the influence of the listener's linguistic experience and knowledge on word recognition processes

Prosody and intonation

  • Prosody refers to the suprasegmental features of speech, such as stress, rhythm, and intonation, that convey linguistic and paralinguistic information beyond the segmental content
  • Intonation specifically refers to the variation in pitch contour over the course of an utterance, which can convey linguistic, attitudinal, and emotional information

Functions of prosody

  • Prosody serves various functions in speech communication, including:
    • Linguistic functions: signaling lexical stress, phrase boundaries, and sentence type (e.g., declarative vs. interrogative)
    • Attitudinal functions: conveying the speaker's attitudes, emotions, and intentions (e.g., sarcasm, enthusiasm, or uncertainty)
    • Discourse functions: managing turn-taking, signaling topic shifts, and indicating the information structure of the utterance (e.g., distinguishing given vs. new information)
  • Listeners use prosodic cues to interpret the intended meaning and structure of the spoken message

Perception of stress and rhythm

  • Lexical stress: listeners are sensitive to the acoustic correlates of lexical stress, such as increased duration, intensity, and pitch prominence on stressed syllables
  • Rhythm: the perception of speech rhythm is influenced by the timing and prominence patterns of syllables in the utterance
  • Languages are often classified as stress-timed (e.g., English, German) or syllable-timed (e.g., French, Spanish) based on their rhythmic properties
  • Listeners use their knowledge of language-specific rhythmic patterns to segment speech and anticipate the location of stressed syllables and word boundaries

Intonation contours and meaning

  • Intonation contours, or the patterns of pitch variation over an utterance, convey linguistic and paralinguistic information
  • Declarative contours: typically characterized by a falling pitch at the end of the utterance, signaling a statement or assertion
  • Interrogative contours: often marked by a rising pitch at the end of the utterance, indicating a question or request for information
  • Emotional prosody: specific intonation patterns can convey emotions such as happiness, sadness, anger, or surprise
  • Listeners interpret intonation contours in conjunction with the segmental content and context to infer the intended meaning and emotional state of the speaker

Prosodic bootstrapping hypothesis

  • The suggests that infants use prosodic cues to initially segment speech and identify linguistic units, such as words and phrases
  • Infants are sensitive to the prosodic properties of their native language from an early age, even before they have acquired a substantial vocabulary
  • Prosodic cues, such as stress patterns and intonational phrases, can help infants detect word boundaries and syntactic structures in the speech input
  • This initial prosodic segmentation is thought to facilitate the acquisition of other aspects of language, such as phonology, lexicon, and grammar
  • The prosodic bootstrapping hypothesis highlights the important role of prosody in early language development and its interaction with other levels of linguistic analysis

Speech perception development

  • Speech perception abilities develop gradually from infancy through childhood, shaped by the child's linguistic experience and maturation of the auditory and cognitive systems
  • Infants demonstrate remarkable speech perception skills early in life, which become more specialized and attuned to their native language over the course of development

Infants' speech perception abilities

  • Newborns show sensitivity to speech sounds and can discriminate between phonetic contrasts from various languages, not just their native language
  • Infants prefer speech over non-speech sounds and show a preference for infant-directed speech (motherese) over adult-directed speech
  • By 6-8 months, infants can segment words from fluent speech using statistical learning and prosodic cues
  • Around 9-10 months, infants show improved discrimination of native language phonetic contrasts and a decline in sensitivity to non-native contrasts (perceptual narrowing)

Perceptual narrowing and tuning

  • Perceptual narrowing refers to the process by which infants' initial broad sensitivity to speech sounds becomes more specialized and attuned to the phonetic contrasts of their native language
  • This narrowing occurs through exposure to the statistical regularities and phonetic distributions of the ambient language
  • Infants' discrimination of non-native phonetic contrasts declines, while their sensitivity to native contrasts improves
  • Perceptual narrowing is thought to reflect the optimization of speech perception skills for efficient processing of the native language

Role of infant-directed speech

  • Infant-directed speech (IDS), or motherese, is a special register of speech used by caregivers when interacting with infants
  • Characterized by higher pitch, slower tempo, exaggerated intonation contours, and simplified vocabulary and grammar
  • IDS is thought to facilitate language acquisition by providing clearer acoustic cues, capturing infants' attention, and conveying emotional information
  • Exposure to IDS has been associated with improved speech discrimination, word segmentation, and vocabulary development in infants

Bilingual speech perception development

  • Bilingual infants are exposed to two languages from an early age and must learn to discriminate and process the speech sounds of both languages
  • Bilingual infants show a different trajectory of perceptual narrowing compared to monolingual infants, maintaining sensitivity to the phonetic contrasts of both languages
  • The development of speech perception in bilinguals is influenced by factors such as the amount and quality of exposure to each language, the similarity between the languages, and the social context of language use
  • Bilingual experience may enhance certain cognitive and linguistic skills, such as executive function and phonological awareness, which can support speech perception and language learning

Neurobiology of speech perception

  • Speech perception is supported by a complex network of brain regions that process acoustic, phonetic, and linguistic information
  • Neuroimaging and electrophysiological studies have provided insights into the neural mechanisms underlying speech perception and its development

Brain regions involved

  • : located in the superior temporal gyrus, it performs the initial analysis of acoustic features of speech sounds
  • (STS): involved in the integration of acoustic and phonetic information, as well as the processing of speech-specific temporal and spectral patterns
  • (IFG): plays a role in phonological processing, articulatory mapping, and the integration of speech with higher-level linguistic information
  • (IPL): involved in the mapping between acoustic-phonetic representations and articulatory motor plans, supporting the

Hemispheric lateralization

  • Speech perception is typically lateralized to the left hemisphere in most right-handed individuals
  • The left hemisphere shows specialization for the processing of rapidly changing temporal information, which is crucial for phonetic discrimination and segmentation
  • The right hemisphere is more involved in the processing of prosodic and emotional aspects of speech
  • Hemispheric lateralization for speech perception emerges early in development and is influenced by the acoustic properties and linguistic structure of the speech signal

ERP studies of speech processing

  • Event-related potentials (ERPs) are electrophysiological responses time-locked to specific sensory, cognitive, or motor events, providing high temporal resolution for studying speech perception
  • Mismatch negativity (MMN): an ERP component elicited by infrequent deviant stimuli in a sequence of standard stimuli, reflecting pre-attentive auditory discrimination and sensory memory
  • N400: an ERP component associated with semantic processing, reflecting the ease of semantic integration of a word into the preceding context
  • P600: an ERP component related to syntactic processing, reflecting the reanalysis or repair of syntactic violations or ambiguities
  • ERP studies have revealed the time course of different stages of speech processing and the influence of various linguistic factors on speech perception

fMRI and PET imaging findings

  • Functional magnetic resonance imaging (fMRI) and positron emission tomography (PET) provide high spatial resolution for localizing brain activity during speech perception tasks
  • fMRI studies have shown activation in the superior temporal gyrus, inferior frontal gyrus, and inferior parietal lobule during phonetic discrimination, word recognition, and sentence comprehension tasks
  • PET studies have revealed changes in regional cerebral blood flow associated with different aspects of speech processing, such as
© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.


© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.
Glossary
Glossary