Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition
Reexamination Certificate
1998-06-02
2002-03-05
Smits, Talivaldis Ivars (Department: 2641)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Recognition
Reexamination Certificate
active
06353809
ABSTRACT:
BACKGROUND OF THE INVENTION
The present invention relates to a speech recognition apparatus and a recording medium having a speech recognition program recorded therein. More particularly, this invention is concerned with a speech recognition apparatus for recognizing voice data, and a recording medium in which a speech recognition program causing a computer to recognize voice data is recorded.
SUMMARY
In recent years, research and development of speech recognition technology has been undertaken in earnest. A technological means capable of recognizing voice in real time has been proposed. This kind of technology has been adapted to various kinds of products or usages, for example, reservation of tickets by telephone or voice commanding within car navigation.
Along with a recent breakthrough in speech recognition technology and improvement in performance of personal computers, a technology for documenting voice input through a microphone connected to a personal computer by recognizing speech within application software running in the personal computer, and displaying the document has been developed.
An example of a software package enabling speech recognition is a product “Voice Type 3.0 for Windows 95” released recently by IBM Ltd. This product converts voice input through a microphone into text data in real time and enjoys a considerably high recognition ratio.
However, the application software permits real-time input through a microphone that is only one means for inputting voice data. An already existent voice file cannot be recognized directly.
One object of development of the aforesaid speech recognition technology is to realize a so-called speech word processor or a dictation system for automatically creating a document on the basis of voice data input by performing dictation, and displaying the document in a screen or the like.
A conventionally adopted means is such that when the contents of a document to be created are dictated and temporarily recorded by a recording apparatus such as a tape recorder, and a secretary, typist, or the like reproduces the dictated contents and documents them using a documentation apparatus such as a type writer, word processor, or the like. This style has been generally adopted as one form of effective utilization of the recording apparatus such as a tape recorder.
As for such dictational recording, a technique of appending an index mark or end mark to voice data so as to give instructions to a secretary or typist has been known in the past. According to a prior art of appending such a mark, a desired region of voice data is not designated as an interval but a specified region of voice data is designated as a point.
In the foregoing form of utilization in which a recording apparatus is used for dictation, the birth of a technology for automatically converting the contents of a record into a document has been greatly demanded in the past.
In actual dictation, a word irrelevant to contents to be informed may be contained. For example, when written sentences are recited, an incorrectly uttered word or a word having no meaning such as “Ah” or “Well” (hereinafter an unnecessary word) may be contained (frequently in some cases).
In this case, the performance of speech recognition deteriorates. This leads to a drawback that a document displayed in a screen contains many mistakes. A technology for constructing a dictation system by taking account of the above unnecessary words and creating language models that cover all words including the unnecessary words and that are intended to be used for speech recognition has been proposed in the past.
For example, according to Japanese Unexamined Patent Publication No. 7-5893, there is provided a speech recognition apparatus comprising: a standard pattern memory means for storing standard patterns; an unnecessary word pattern memory means for storing patterns of unnecessary words; a word spotting means for spotting as a word or word-spotting a standard pattern stored in the standard pattern memory means or a pattern of an unnecessary word stored in the unnecessary word pattern memory means on the basis of input voice, and outputting a corresponding interval and score; a producing means for hypothesizing the contents of uttered voice and producing a representation of the meaning; and an analyzing means for analyzing the result of word-spotting, which is performed by the word spotting means, on the basis of the representation of the meaning of the hypothesis produced by the producing means. The analyzing means allocates a score resulting from word-spotting performed on the pattern of an unnecessary word to remaining intervals, of which corresponding standard patterns or patterns of an unnecessary word have not been word-spotted, among all the intervals of data items constituting the voice. The result of word-spotting performed by the word spotting means is then analyzed.
However, the speech recognition apparatus described in the Japanese Unexamined Patent Publication No. 7-5893 has difficulty in carrying out practical processing within an existing computer (especially a computer of a personal level) because the data size of language models becomes enormous.
Using a currently commercialized product, a speaker must be careful in not uttering an unnecessary word or the like and cannot therefore help feeling clumsiness.
For improving the performance of speech recognition, it is required that the sound level of input voice is proper. Currently, it is hard to guarantee a high recognition ratio over a wide range of sound levels from a low level to a high level. A system is therefore designed to provide a maximum recognition ratio relative to an average sound level of voice.
In a speech recognition apparatus of a mode in which voice is input through a microphone as mentioned above, a sound-level meter for indicating a sound level of voice is displayed in, for example, a screen or the like so that a speaker himself/herself can manage his/her sound level of voice properly.
As an example of an embodiment of this technology, a sound pressure level display for a speech recognition apparatus comprising a first sound receiver for receiving a voice signal, a second sound receiver for receiving a noise whose level is close to that of the voice signal received by the first sound receiver, a sound pressure level ratio calculating means for calculating a ratio of a sound pressure level of a voice signal input to the first sound receiver to a ratio of a sound pressure level of a noise input to the second sound receiver, and a display means for displaying the ratio of sound pressure levels calculated by the sound pressure level ratio calculating means is described in Japanese Unexamined Patent Publication No. 5-231922.
However, it is annoying for a speaker to manage his/her own voice so that the sound level will become proper. There is therefore an increasing demand for a user-friendly speech recognition apparatus. Moreover, since the sound level of input voice cannot be detected using already recorded voice data, the technology disclosed in the Japanese Unexamined Patent Publication No. 5-231922 cannot be adapted as it is. It cannot be judged whether or not the sound level of voice data is suitable for speech recognition. Besides, since the sound pressure level display is not provided with a facility for adjusting a sound level of voice autonomously, a voice recognition ratio may vary abruptly depending on a sound level indicated by recorded voice data.
A first object of the present invention is to provide a speech recognition apparatus for recognizing speech represented by voice data recorded in a given recording medium and a recording medium in which a speech recognition program is recorded.
A second object of the present invention is to provide a speech recognition apparatus capable of treating an unnecessary word or the like contained in voice without the need of especially fast processing, and a recording medium in which a speech recognition program is recorded.
A third object of the present invention is to provide a speec
Onishi Takafumi
Takahashi Hidetaka
Olympus Optical Ltd.
Smits Talivaldis Ivars
Volpe and Koenig P.C.
LandOfFree
Speech recognition with text generation from portions of... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Speech recognition with text generation from portions of..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Speech recognition with text generation from portions of... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2844449