Speech recognition method, speech recognition device, and...

Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S250000

Reexamination Certificate

active

06446039

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of Invention
This invention relates to a speech recognition method, a speech recognition device, and a recording medium on which is recorded a speech recognition processing program, in which improved recognition capability is achieved by enabling speaker adaptation processing in which the speaker can be registered with respect to a specified word among recognizable words.
2. Description of Related Art
Recently, electronic devices which use speech recognition technology are used in various fields. As one example, a clock which is called a sound clock can be listed. In this sound clock, a current time and an alarm time can be set by sound, and the sound clock can inform a user of a current time by sound.
This type of sound clock can be used as a toy for children in addition to being used as a daily necessity. It is desired that the cost of the device itself can be as low as possible. Because of this, there is a large limitation on the CPU processing capability and memory capacity which are used. One of the problems to be solved is to have functions with high capability under these limitations.
Conventionally, many devices which use this type of speech recognition can perform speech recognition for a non-specific speaker, but in order to perform speech recognition for a non-specific speaker, a large size of standard speaker sound model data is needed, and a large-capacity ROM and a CPU with high processing ability are needed. Therefore, the cost can be eventually high.
Furthermore, even for a non-specific speaker, depending upon the type of the device, generation and gender which can be used are limited to a certain degree. As a result, standard speaker model data which is limited to a certain area is acceptable. Because of this, even if a large size of standard speaker sound model data is provided, there is a lot of waste. Additionally, there was a problem of a recognition percentage because it can correspond to a wide variety of non-specific speakers in an averaged manner.
In order to solve this problem, a relatively inexpensive speech recognition LSI exists which enables non-specific speaker recognition with respect to a plurality of recognizable words which are prepared in advance, and which, at the same time, has a function which performs a registration type of speech recognition by registering a sound of a specific speaker for the specific speaker.
In this type of speech recognition LSI, a word which is prepared in advance can certainly be recognized even for a sound of a non-specific speaker. Furthermore, sound data of a specific speaker can be registered for the specific speaker. Therefore, it is thought that speech recognition with high capability can be realized for a wide variety of speakers.
However, with respect to this type of conventional speech recognition LSI, recognition can be performed at a high recognition percentage for speech recognition for a registered specific speaker, but if the gender and/or age range of speakers is wide, the recognition percentage can be significantly decreased in general with respect to speech recognition for non-specific speakers.
Furthermore, in order to improve the recognition percentage for a non-specific speaker, there are devices such that a speaker speaks several dozen words and speaker adaptation can be performed based upon this sound data.
However, in general, there are many cases such that the speaker adaptation function can be applied to a device with a CPU with high processing capability and a large-capacity memory. There are many cases such that it is difficult to apply this function to a device with large limitations in the processing capability of the CPU and the memory capacity because low cost is strongly demanded for toys and daily necessities.
SUMMARY OF THE INVENTION
Therefore, one aspect of this invention is to register sound data which is obtained as a speaker who uses the device, speaks a specified word, and at the same time, to significantly improve the speech recognition percentage for the speaker who uses the device (recognition target speaker), by performing speaker adaptation using this registration data and the standard speaker sound model data.
In order to achieve this aspect, the speech recognition method and apparatus of this invention may have standard speaker sound model data which has been created from sound data of a plurality of non-specific speakers, and can recognize a plurality of predetermined words. Among the plurality of words which can be recognized, several words are selected as registration words. The recognition target speaker speaks the respective registration words, and registration word data is created and saved for the respective registration words from the sound data. When the registration words are spoken by the recognition target speaker, speech recognition is performed by using the registration word data. When other recognizable words are spoken, speech recognition is performed using the standard speaker sound model data.
In addition, the plurality of recognizable words may be divided according to the respective type of words, and are prepared as word sets which correspond to the respective divisions. A device is set to recognize words which belong to a given word set in the operating scene at a given point in time, determine which word set a word is input from at the current point in time, and based upon the determination result, the recognition of the word which has been input in the scene can be performed.
Furthermore, it is also acceptable to focus the recognition target speaker into an area which is set in advance based upon age and gender, create specific speaker group sound model data from the sound data of a plurality of non-specific speakers which belong to the area, and save this as the standard speaker group sound model data.
The recognition target speakers can include a plurality of speaker groups based upon the characteristics of the sound. The specific speaker group sound model data can also include specific speaker group sound model data corresponding to the plurality of speaker groups which have been created from the sound data of a plurality of non-specific speakers which belong to the respective speaker groups.
In addition, speaker learning processing may be performed using the registration word data, the standard speaker sound model data, and the specific speaker group sound model data, such that when a recognizable word other than one of the registration words is recognized, adaptation processing can be performed using the post-speaker learning data, and speech recognition is performed.
Additionally, the speaker learning processing may create an inputting speaker code book by a code book mapping method and any of the code books which has been created based upon the standard speaker sound model data or the specific speaker group sound model data. Furthermore, by using a universal code book, the inputting speaker code book may be vector-quantized, and a quantized inputting speaker code book may be created.
Furthermore, the speech recognition device of this invention may have standard speaker sound model data which has been created from the sound data of a plurality of non-specific speakers, and which can recognize a predetermined plurality of words. The speech recognition part has at least a sound analysis unit that analyzes sound which has been obtained as a speaker speaks, several words among the plurality of recognizable words being selected as registration words; registration word data which has been created for the respective registration words from sound data which has been obtained by having the recognition target speaker speak the respective registration words; and a controller which, when one of the registration words is spoken by the recognition target speaker, performs speech recognition using the registration word data, and which, when recognizable words other than the registration words are spoken, performs speech recognition using the standard speaker sound model data.
In this type of speech recognition device,

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Speech recognition method, speech recognition device, and... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Speech recognition method, speech recognition device, and..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Speech recognition method, speech recognition device, and... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2896147

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.