Speech recognition aided by lateral profile image

Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition

Reexamination Certificate

Rate now

[ 0.00 ] – not rated yet Voters 0 Comments 0

Details Speech recognition aided by lateral profile image Speech recognition aided by lateral profile image

: 1998-09-14
: 2001-02-06
: {haeck over (S)}mits, T{overscore (a)}livaldis I. (Department: 2741)
: Data processing: speech signal processing, linguistics, language
: Speech signal processing
: Recognition

: C704S231000, C704S270000
: Reexamination Certificate
: active
: 06185529
: ABSTRACT:

DESCRIPTION
1. Technical Field
The present invention relates in general to an automatic speech recognition system, and more particularly to a system and a method for recognizing speech by analyzing lateral mouth or face images.
2. Prior Art
Automatic voice recognition by computers which can interpret voice input has been an important goal in the information processing industry. Such speech recognition simplifies person-to-computer interaction and greatly enhances machine usability. As speech-recognition technology improves, a growing number of companies, including securities companies, e.g., on their trading floors, and airlines are starting to use speech recognition technology enterprise-wide for their critical processes. However, the existing technology still has problems in accurately distinguishing the nuances of the human voice, especially amidst the clutter of noisy work environments. The existing state-of-the-art systems are only able to achieve acceptable accuracy for the prepared language uttered in a quiet environment. Therefore, it would be highly desirable to have a system and a method for recognizing natural speech more accurately, even in noisy environments.
Although the existing state-of-the-art acoustic-only speech recognition systems perform well in distinguishing vowel sounds, they are less successful at differentiating among consonant sounds. Another weakness of acoustic-only speech recognition is an inability to determine breaks between syllables, which are vital to accurate recognition. Therefore, in a speech recognition system, it would be highly desirable to provide means for acquiring types of data that best complement acoustic data, leading to affordable and reliable speech recognition.
To improve speech-recognition technology, researchers in the following publications, have shown that simultaneous video imaging of the face or mouth of the speaker can yield data that can be used together with acoustic speech recognition algorithms to improve the recognition accuracy. In J. T. Wu, S. Tamura, Mitsumoto, H. Kawai, K. Kurosu, and K. Okazaki, “Speaker-Independent Vowel Recognition Combining Voice Features and Mouth Shape Image with Neural Network,” Systems and Computers in Japan, vol. 22, pp. 100-107 (1991), voice features and mouth shape images are combined and used for training error back-propagation neural networks to achieve speaker-independent vowel recognition.
P. L. Silsbee, A. C. Bovik, “Automatic Lipreading,” Biomedical Sciences Instrumentation, vol. 29, pp. 415-422 (1993), describes automatic visual lipreading system intended to supplement a standard automatic speech recognizer. P. L. Silsbee, A. C. Bovik, “Audio Visual Speech Recognition for a Vowel Discrimination Task,” Proc. SPIE, vol. 2094, pp. 84-95 (1993), describes a speaker dependent lipreading system using hidden Markov modeling, which may be used in conjunction with an audio automatic speech recognition system to improve the accuracy of speech recognition.
U.S. Pat. No. 4,769, 845 issued Sep. 6, 1988 to H. Nakamura, entitled “Method of Recognizing Speech Using a Lip Image,” describes a speech recognition method with an image pickup apparatus for collecting lip data during speech to be used for recognizing speech. U.S. Pat. No. 4,757,541 issued Jul. 12, 1988 to R. L. Beadles, entitled “Audio Visual Speech Recognition,” also describes automatic lipreading without audio input for speech recognition system.
U.S. Pat. No. 4,975,960 issued Dec. 4, 1990 to E. D. Petajan, entitled “Electronic Facial Tracking and Detection System and Method and Apparatus for Automated Speech Recognition,” describes circuitry for obtaining a video image of an individual's face, and electronically locating and tracking frontal facial features such as the nostrils and mouth for use in combination with acoustics for speech recognition.
Using the front view of mouth shapes and tongue positions to recognize acoustic speech signals marginally improves the performance of the speech recognition system, in a noisy environment where ability to recognize acoustic signals may become degraded due to the background noise. However, the analysis of front images of the face or mouth itself poses a complex problem because such analysis requires significant computation. Moreover, the results attained may not have the reliability required for many applications.
SUMMARY OF THE INVENTION
The present invention is directed to an apparatus and a method for imaging the mouth area laterally to produce reliable measurements of mouth and lip shapes for use in assisting the speech recognition task. Acquisition and analysis of lateral profiles become much simpler than front view analysis of mouth and lip shapes because only a minimum set of lateral profile features is required for distinguishing syllables. Accordingly, it is an object of the present invention to provide a relatively simple apparatus and a method for acquiring lateral profiles of the mouth and lip shapes for use with acoustic data during speech recognition process.
To attain the above and other objectives of the present invention, a video camera mounted on a headphone assembly with one or more microphone(s) is arranged to generate profile images of a speaker's mouth area. A light source is included to provide illumination surrounding the mouth area. A diffuse screen is provided near the light source for diffusing the illuminated light emitted by the light source around the speaker's mouth area. A mirror, which is preferred to be flat, is also included and is situated near the video camera. The mirror generally reflects the mouth area.
From the captured profile images, a profile of a mouth is extracted and stored into a one-dimensional vector from which features such as lip separation, lip shape, and intrusion depth may be calculated. The computed features may then be used in conjunction with acoustics signals for accurate speech recognition. Particularly, the computed features may be provided as training sets to a hidden Markov model (HMM) for recognizing speech.
Further features and advantages of the present invention as well as the structure and operation of various embodiments of the present invention are described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers indicate identical or functionally similar elements.

REFERENCES:
patent: 4757541 (1988-07-01), Beadles
patent: 4769845 (1988-09-01), Nakamura
patent: 4975960 (1990-12-01), Petajan
patent: 5286205 (1994-02-01), Inouye et al.
patent: 5586215 (1996-12-01), Stork et al.
patent: 5806036 (1998-09-01), Stork
patent: 0 254 409 (1988-04-01), None
Benoit, “Synthesis and Automatic Recognition of Audio-Visual Speech”, Integrated Audio-Visual Processing for Recognition, Synthesis & Communication colloquium, IEEE, Nov. 28, 1996.
Wu, et al., “Speaker-Idependent Vowel Recognition Combining Voice Features and Mouth Shape Image With Neural Network”; Systems and Computers in Japan, vol. 22, No. 4, pp. 100-107(1991).
Silsbee, et al., “Automatic Lipreading”; Biomedical Sciences Instrumentation, v 29, pp. 415-422 (1993).
Silsbee, et al., “Audio Visual Speech Recognition For A Vowel Discrimination Task”; Proc. SPIE-Int. Soc. Opt. Eng. (USA) v 2094, pp. 84-95(1993).
Kenji Mase, et al., “Automatic Lipreading by Optical-Flow Analysis”; Systems and Computers in Japan, vol. 22, No. 6 (1991).
Lalit R. Bahl, et al; “Performance of the IBM Large Vocabulary Continuous Speech Recognition System on the ARPA Wall Street Journal Task”; Computer Science RC 19635 (87076) (1994).

Affiliated with

Chen Chengjun Julian

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Wu Frederick Yung-Fung

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Yeh James T.

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Also associated with

International Business Machines - Corporation

Corporate Assignee

[ 0.00 ] – not rated yet Voters 0 Comments 0

Kaufman, Esq. Stephen C.

Attorney

[ 0.00 ] – not rated yet Voters 0 Comments 0

Nolan Daniel A.

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

Scully Scott Murphy & Presser

Law Firm

[ 0.00 ] – not rated yet Voters 0 Comments 0

{haeck over (S)}mits T{overscore (a)}livaldis I.

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Speech recognition aided by lateral profile image does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Speech recognition aided by lateral profile image, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Speech recognition aided by lateral profile image will most certainly appreciate the feedback.

Rate now

Comments { 0 }

Profile ID: LFUS-PAI-O-2604520

All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.

Canada

Charities
Companies
MP Candidates
Patents
Employee Salary Disclosure

World

Places of the World
Scientific Papers

United States

Banks
Companies
Counties
Patents
Employee Salary Disclosure