Spectral enhancement of acoustic signals to provide improved...

Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S200100

Reexamination Certificate

active

06732073

ABSTRACT:

FIELD OF THE INVENTION
This invention pertains generally to the field of audio signal processing and particularly to hearing aids and speech recognition.
BACKGROUND OF THE INVENTION
Individuals with normal hearing are able to perceive speech in the face of extreme context-sensitivity resulting from coarticulation. The ability of listeners to recover speech information, despite dramatic articulatory and acoustic assimilation, is remarkable and central to understanding speech perception. The degree to which listeners perceptually accommodate articulatory constraints often has encouraged perceptual theories that assume relatively detailed reference to articulatory acts themselves, either with respect to general theoretical commitments, or with appeal to specialized speech perception processes unique to humans and to vocal tracts. In each case, correspondences between perception and production are typically taken as evidence of perception of articulatory acts per se. Some approaches have been to appeal to tacit knowledge of coarticulatory acts or their acoustic consequences, and such knowledge-based processes can be viewed as more (e.g., Repp, B. H., “Phonetic Trading Relations and Context Effects: New Evidence for a Speech Mode of Perception,” Psychological Bulletin, Vol. 92, 1982, pp. 81-110.) or less (e.g., Diehl, R. L. & Kluender, K. R., “On Categorization of Speech Sounds,” Stevan Harnad (Ed.), Categorical Perception, Oxford University Press, 1987, pp. 226-253) specific to speech.
Lack of invariance in the relation between fundamental linguistic units—phonemes—and attributes of the acoustic signal poses a central problem in understanding the nature of speech perception. The basic problem is that there seem to exist few or no unitary attributes in the acoustic signal that uniquely specify particular phonemes. The prime culprit for this state of affairs is coarticulation of speech sounds. Coarticulation refers to the spatial and temporal overlap of adjacent articulatory activities. This is reflected in the acoustic signal by severe context-dependence; acoustic information specifying one phoneme varies substantially depending on surrounding phonemes. One of the more widely described cases for such context dependence concerns the realization of the phonemes /d/ and /g/ as a function of preceding liquid (Mann, V. A., “Influence of Preceding Liquid in Stop-Consonant Perception,” Perception & Psychophysics, Vol. 28, 1980, pp. 407-412.) or fricative (Mann, V. A. & Repp, B. H., “Influence of Preceding Fricative on Stop Consonant Perception,” Journal of the Acoustical Society of America, Vol. 69, 1981, pp. 548-558). Perception of /d/ as contrasted with perception of /g/, is largely signaled by the onset frequency and frequency trajectory of the third formant (F3). In the context of a following /a/, a higher F3 onset encourages perception of /da/ while a lower onset results in perception of /ga/. The onset frequency of the F3 transition also can vary as a function of the preceding consonant. For example, F3-onset frequency for /da/ is higher following /al/ in /alda/ relative to when following /ar/ in /arda/. The offset frequency of F3 is higher for /al/ owing to a more forward place off articulation and lower for /ar/. Perception of /da/ and /ga/ has been shown to be affected by the composition of preceding acoustic information in a fashion that accommodates these patterns in production. For a series of synthesized consonant-vowel syllables (CVs) varying in onset characteristics of the third formant (F3) and varying perceptually from /da/ to /ga/, subjects are more likely to perceive /da/ when preceded by the syllable /ar/, and to perceive /ga/ when preceded by /al/ (Mann, V. A., “Influence of Preceding Liquid in Stop-Consonant Perception,” Perception & Psychophysics, Vol. 28, 1980, pp. 407-412). In subsequent studies, the effect has been found for speakers of Japanese who cannot distinguish between /l/ and /r/ (Mann, V. A., “Distinguishing Universal and Language-Dependent Levels of Speech Perception: Evidence from Japanese Listeners' Perception of English “l” and “r,” Cognition, Vol. 24, 1986, pp. 169-196) and for prelinguistic infants (Fowler, C. A., Best, C. T. & McRoberts, G. W., “Young Infants' Perception of Liquid Coarticulatory Influences on Following Stop Consonants,” Perception & Psychophysics, Vol. 48, 1990, pp. 559-570). The important point is that, for the very same stimulus with F3 onset intermediate between /da/ and /ga/, the percept is altered as a function of preceding context. Listeners perceive speech in a manner that suggests sensitivity to the compromise between production of neighboring phonetic units.
Different theoretical perspectives provide alternative accounts for how acoustic effects of coarticulation are disambiguated in perception. One approach has been to search harder for invariant attributes in the signal that correspond to phonetic features, and hence phonemes (e.g. Stevens, K. N. & Blumstein, S. E., “The Search for Invariant Acoustic Correlates of Phonetic Features,” P. D. Eimas & J. L. Miller (Ed.), Perspectives in the Study of Speech, Hillsdale, N.J.: Erlbaum, 1981). To date, this approach has yielded mixed results with more recent efforts being directed to relatively modest features of the acoustic signal that may seem likely to have slim prospects for survival under noisy conditions typical to speech communication. Further, it is unlikely that invariants exist to explain the aforementioned perceptual phenomenon when one considers the fact that the exact same acoustic information is perceived differently within different contexts. Another tack can be found in Motor Theory (e.g. Liberman, A. M. & Mattingly, I. G., “The Motor Theory of Speech Perception Revisited,” Cognition, Vol. 21, 1985, pp. 1-36) which holds that phonetic perception is the perception of the speech gestures and that processes specific to humans recover gestural invariants not apparent in the acoustic signal. Because the lack of invariance in the acoustic signal is the consequence of variability in articulator movements, later versions of this theory suggest that it is intended gestures which are detected.
A third approach is that of Direct Realism (e.g. Fowler, C. A., “An Event Approach to the Study of Speech Perception from a Direct-Realist Perspective,” Journal of Phonetics, Vol. 14, 1986, pp. 3-28). Direct Realism is a general theory for all senses holding that perception is an act by which properties of the physical world that are significant to a perceiver, “distal events,” are directly recovered without intermediate construction. For speech perception, distal events are held to be linguistically relevant articulations of the vocal tract. In terms of what one desires in a broad theoretical framework, Direct Realism may be the most general, elegant, and internally consistent theory. Perhaps the most critical concern with regard to this approach, however, is that one must be able to solve the “inverse problem.” In order to recover a unique distal event in any modality, the perceiver has only the physical energy available to sensory receptors. Independent of classic concerns regarding the extent to which one should view this source of information as rich or impoverished, what must be true is that there is sufficient information to successfully make the inverse transformation to a unique distal event. This requires the existence of some sort of invariant in the signal, perhaps an invariant specified as a function of time. In the absence of an invariant, the best one can do is define some set of possible distal events. Physical acoustic invariants signaling phonemes have not been easy to come by, and Fowler, C. A., “Invariants, Specifiers, Cues: An Investigation of Locus Equations as Information for Place of Articulation,” Perception & Psychophysics, Vol. 55, 1994, pp. 597-610 has provided evidence that one recent candidate, locus equations (e.g., Sussman, H., “Neural Coding of Relational Invariance in Speech: Human Language Analogs to the Barn Owl,” Psychological Review,

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Spectral enhancement of acoustic signals to provide improved... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Spectral enhancement of acoustic signals to provide improved..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Spectral enhancement of acoustic signals to provide improved... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3208174

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.