Speech detection apparatus in which standard pattern is...

Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S275000

Reexamination Certificate

active

06343269

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to a speech input and detection technique that is not affected by noise occurring in a noise environment or a situation where many people speak simultaneously. And the invention relates to a speech detection apparatus for outputting speech information that is detected from movements of an articulator of a human to information equipment such as a computer or a word processor.
The invention relates to a technique of enabling detection of speech information in both cases of voiced speech and voiceless speech by mimicry. Therefore, the technique of the invention can be utilized not only in offices or the like where silence is required and the use of related speech input techniques is not suitable, but also for input of a content that the user does not want to be heard by other people. As such, the invention greatly increases the range of use of speech detection apparatus. Further, the invention can be utilized for a speech detection apparatus for providing barrier-free equipment that enables deaf people, people having difficulty in hearing, and aged people to communicate information smoothly.
2. Description of the Related Art
The target of a speech detection apparatus (machine) is to enable the user's speech to be input correctly and quickly in any environment. An ordinary speech detection apparatus employs a speech recognition technique of recognizing and processing speech information by analyzing frequencies of a voice as a sound. To this end, the cepstrum analysis method or the like is utilized that enables separation and extraction of a spectrum envelope or a spectrum fine structure of a voice. However, this speech recognition technique has a principle-related disadvantage that naturally it cannot detect speech information unless it receives sound information generated by vocalization. That is, such a speech detection apparatus cannot be used in offices, libraries, etc. where silence is required, because during speech input a voice of a speaker is annoying to nearby people. This type of speech detection apparatus is not suitable for input of a voice having a content that the user does not want to be heard by nearby people. Further, the user will be rendered in a psychology of feeling reluctant to murmur alone to the machine. This tendency is enhanced in a situation where people exist around the user. These disadvantages limit the range of use of speech recognition apparatus and are major factors of obstructing the spread of speech input apparatus. Another obstructive factor is that continuing to speak is unexpectedly a physical burden. It is considered that continuing voice input for hours like manipulating a keyboard will make the user's voice hoarse and hurt his vocal cords.
On the other hand, studies of acquiring speech information from information other than sound information have been made conventionally. The vocal organs directly relating to vocalization of a human are the lungs
901
as an air flow mechanism, the larynx
902
as a vocalization mechanism, the oral cavity
903
and the nasal cavity
904
that assume the mouth
asal cavity function, and the lips
905
that assume the articulation function, though the classification method varies from one technical book to another.
FIG. 9
shows the arrangement of those organs (the lungs
901
are not shown). Studies of acquiring speech information from visual information of the lips
905
among these vocal organs have been made to provide techniques for people handicapped in hearing. It was pointed out that the speech recognition accuracy can be improved by adding visual information of movements of the lips
905
of a speaker to a speech recognition technique (C. Bregler, H. Hild, S. Manke, and A. Waible, “Improving Connected Letter Recognition by Lipreading”,
Proc. IEEE ICASSP
, pp. 557-560, 1993).
Among speech recognition techniques using visual information of the lips, a technique with image processing that uses an image that is input from a video camera is employed most frequently. For example, in Japanese Unexamined Patent Publication No. Hei. 6-43897, as shown in
FIG. 10
, it was attempted to observe movements of the lips by capturing images of 10 reflective markers M
0
to M
9
themselves that were attached to the lips
905
of a speaker and a portion around them, detecting two-dimensional movements of the markers M
0
to M
9
, and determining five lip feature vector components 801-805. In Japanese Unexamined Patent Publication No. Sho. 52-112205, it was intended to improve the accuracy of speech recognition by reading the positions of black markers attached to the lips and a portion around them from scanning lines of a video camera. This publication does not have any specific disclosure as to a marker extraction method; a two-dimensional image pre-process and feature extraction technique for discriminating the markers from density differences that are caused by shades formed by the nose and the lips, a mustache, skin color differences, a mole, a scratch or abrasion, etc. are needed.
To solve this problem, Japanese Unexamined Patent Publication No. Sho. 60-3793 proposed a lip information analyzing apparatus in which four high-luminance markers such as light-emitting diodes are attached to the lips to facilitate the marker position detection, movements of the markers themselves are imaged by a video camera, and pattern recognition is performed on a voltage waveform that is obtained by a position sensor called a high-speed multi-point X-Y tracker. However, even with this technique, when it is attempted to detect speech in a bright room, means is needed to prevent noise that is caused by high-luminance reflection light components coming from the glasses, a gold tooth, etc. of a speaker. Although preprocessing and a feature extraction technique for a two-dimensional image that is input from a television camera are needed for this purpose, the publication No. Sho. 60-3793 has no disclosure as to such a technique.
Several methods have been proposed in which features of a vocal organ are extracted by capturing an image of the lips and a portion around them directly without using markers and performing image processing on the image. For example, in Japanese Unexamined Patent Publication No. Hei. 6-12483, an image of the lips and a portion around them is captured by a camera and vocalized words are estimated by a back propagation method from an outline image obtained by image processing. Japanese Unexamined Patent Publication No. Sho. 62-239231 proposed a technique of using a lip opening area and a lip aspect ratio to simplify lip image information. Japanese Unexamined Patent Publication No. Hei. 3-40177 discloses a speech recognition apparatus retaining, as a database, correlation between vocalized sounds and lip movements to perform recognition for indefinite speakers. Japanese Unexamined Patent Publication No. Hei. 9-325793 proposed to lower the load on a speech recognition computer by decreasing the number of candidate words based on speech-period mouth shape information that is obtained from an image of the mouth of a speaker. However, since these related methods utilize positional information obtained from a two-dimensional image of the lips and a portion around them, for correct input of image information a speaker is required to open and close his lips clearly. It is difficult to detect movements of the lips and a portion around them in speech with a small degree of lip opening/closure and no voice output (hereinafter referred to as “voiceless speech”) and speech with a small voice, let alone speech with almost no lip movements as in the case of ventriloquism. Further, the above-cited references do not refer to any speech detection technique that utilizes, to improve the recognition rate, speech modes such as a voiceless speech mode paying attention to differences between an ordinary speech mode and other ones. The “speech mode” indicating a speech state will be described in detail in the “Summary of the Invention” secti

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Speech detection apparatus in which standard pattern is... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Speech detection apparatus in which standard pattern is..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Speech detection apparatus in which standard pattern is... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2858431

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.