Data processing: speech signal processing – linguistics – language – Speech signal processing – Application
Reexamination Certificate
1998-09-18
2001-11-13
Tsang, Fan (Department: 2645)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Application
C704S270000
Reexamination Certificate
active
06317716
ABSTRACT:
BACKGROUND OF THE INVENTION
This invention relates to automatic cueing of speech.
Speechreading (also sometimes referred to as lipreading) is an important means of communication for the deaf and hearing-impaired. By observing the facial actions of the speaker, the listener can derive significant information about the spoken utterance. However, speechreading alone typically provides a limited basis for communication because many distinctions among speech elements are not observable visually. Cueing of speech provides some of the missing information, allowing the receiver to resolve the ambiguities inherent in speechreading.
Manually cued speech is a phonemically-based system of eight consonant and four vowel classes, each class containing between 2 and 4 phonemes. For each syllable the speaker articulates, he indicates the consonant class by a hand shape and the vowel class by hand position near his mouth. Phonemes that appear similar on speakers' lips are placed in different classes (e.g., /p/, /b/, and /m/ belong to separate cue groups). Combinations of the cues synchronized with natural speech movements make spoken language clearly visible and understandable to the speechreader. The cue receiver can identify all spoken phonemes through the combination of visible facial actions and manual cues and understand every spoken syllable that a hearing person hears. This capability can contribute to the reception of spoken language, face to face communication and the development of general language reading skills.
Although manually cued speech provides enormous benefit in the education of the deaf, the assistance it provides in day-to-day communication is limited to situations in which the speaker, or a transliterator, produces the cues. To overcome this limitation, others have developed systems that would derive and display cues similar to manual cued speech using electronic analysis of acoustic speech signals.
One implementation of artificially displayed cues is described in U.S. Pat. No. 4,972,486 by Cornett et al. (hereinafter, “Cornett”). (See also Cornett, O., Beadles, R., & Wilson, B. (1977), “Automatic cued speech”, in J. M. Pickett (ed.),
Papers from the Research Conference on Speech Processing Aids for the Deaf
, pp.224-239.) In the Cornett system, a computer analyzes speech, identifies speech elements in that speech, and determines appropriate cues corresponding to those speech elements. The Cornett system displays virtual image cues corresponding to the identified cues by a pair of seven-segment LED elements, projected in the viewing field of an eyeglass lens. By activating the segments selectively, nine distinct symbol shapes for cueing phonemes are created. The cue groups selected differ from those used in manual cued speech, but, as in manual cued speech, sounds which are difficult to distinguish through speechreading alone occur in different cue groups.
SUMMARY OF THE INVENTION
In general, in one aspect, the invention features a computer-based method for use in speech cueing. In this aspect, speech elements initially delivered by a speaker are recognized. A sequence of video images is displayed showing the speaker delivering the speech elements. The displayed sequence of video images is delayed relative to the initial delivery of the speech elements by the speaker. In conjunction with displaying the sequence of video images, an image of one of the cues corresponding to the recognized speech elements is displayed with a timing that is synchronized to a visible facial action.
Embodiments of the invention may include one or more of the following features.
An audio signal corresponding to the speech elements initially delivered by the speaker is played back, the audio signal being synchronized with the displayed video sequence.
The visible facial action corresponds to the recognized speech elements corresponding to that cue. At least one of the cues has a corresponding start time for displaying the image of that cue. The start time of that cue is synchronized to a beginning of the visible facial action corresponding to the recognized speech elements corresponding to that cue. The beginning of the visible facial action is determined to be a pre-selected period of time prior to a start time of the recognized speech elements. The period of time depends on a characteristic of the recognized speech elements.
In general, in another aspect of the invention, each of the cues comprises a representation of a human hand in a discrete position. The discrete positions of two successive different cues are different. One of the different cues is displayed at positions along a path between the two discrete positions, for example, to suggest to a viewer smooth motion between the two successive cues.
Embodiments of the invention may include one or more of the following features.
Speech elements initially delivered by a speaker are recognized and the cues corresponding to the speech elements are determined based on those recognized speech elements. The motion is suggested by also displaying the other one of the different cues at positions along a path between the two discrete positions.
A period of time between the display of one of the cues in its discrete position and the display of a successive one of the cues in its discrete position is determined, where the smooth motion is suggested by displaying the cues during the determined period of time. The cues are displayed in positions other than the discrete positions during the determined period of time.
In general, in another aspect of the invention, the speech elements are recognized based on an acoustic signal from the speaker as the speech elements are delivered. The start times of the speech elements are delayed relative to the initial delivery of the speech elements by the speaker. In conjunction with displaying a sequence of video images of the speaker delivering speech elements, images of cues corresponding to the recognized speech elements are displayed at selected periods of time prior to the delayed start times of the speech elements.
Embodiments of the invention may include one or more of the following features.
The selected periods of time can be the same for all cues. The selected period of time of one cue can also be different from the selected start time of another cue. The selected periods of times can be selected based on a characteristic of the recognized speech elements.
The selected periods of time can also be selected based on start times of visible facial actions, where the visible facial actions correspond to the recognized speech elements which correspond to the cues.
The displayed video sequence is delayed relative to the initial delivery of the speech elements by the speaker and the start times the recognized speech elements are synchronized with the delayed video sequence.
In general, in another aspect, the invention features a computer-based method for use in speech cueing. A sequence of video images showing a speaker delivering speech elements is displayed and, in conjunction with displaying the sequence of video images, images of hand cues corresponding to the speech elements are displayed. Each of the hand cues includes a representation of a human hand in a discrete position. At least one of the cues is displayed with a supplementary visual feature.
Embodiments of the invention may include one or more of the following features.
Speech elements initially delivered by a speaker are recognized. The hand cues are then determined from the recognized speech elements.
The supplementary visual feature makes a cue more discriminable than when that cue is displayed without such supplementary visual feature. The supplementary visual feature can be an outline of the representation of the human hand for the cue where the outline is superimposed on the representation of the human hand. The supplementary visual feature can also be a color of the representation of the human hand for the cue, where the color is characterized by brightness, hue, and saturation values. The color can be different from the color of another
Braida Louis D.
Bratakos Maroula S.
Duchnowski Paul
Krause Jean S.
Lum David S.
Fish & Richardson P.C.
Massachusetts Institute of Technology
Opsasnick Michael N.
Tsang Fan
LandOfFree
Automatic cueing of speech does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Automatic cueing of speech, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Automatic cueing of speech will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2595278