Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition
Reexamination Certificate
1999-10-12
2004-04-20
{haeck over (S)}mits, Talivaldis Ivars (Department: 2654)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Recognition
C704S252000, C704S254000
Reexamination Certificate
active
06725197
ABSTRACT:
FIELD OF THE INVENTION
The invention relates to a method of automatic recognition of an at least partly spelled speech utterance, with a speech recognition unit based on statistical models including a linguistic speech model.
The automatic recognition of spelled speech utterances nowadays still has high error rates. On the one hand, the problem is to detect the boundaries between the individual letters, because a user, when spelling, regularly pronounces the individual letters without pauses i.e. silences between the letters. Furthermore, it is hard to acoustically model the letters representing brief speech units and being without context.
In the field of navigation systems for motor vehicles it is known that an entry mode is rendered available to a user in which navigation data—for example, place names—are entered by spelling out (cf. the Carin navigation system).
With the entry of a place name in such a navigation system, to be briefly explained in the following, and after the entry mode for entering place names has been activated, the letters of the respective alphabet that can be entered are shown to the user on a picture screen. By turning a multifunction button, the user can switch to and fro between the individual letters. The selection and thus entry of a letter is effected by pressing the multifunction button. Before the first letter of the respective place name is entered the user is offered all the letters of the respective alphabet to select from. After the user has selected a first letter, the navigation system performs a comparison with a database stored on a compact disc (CD). The result provides information about which letters in the place names that can be processed by the system can follow each other. Thus, after the user has entered a first letter, the comparison with the database will lead to the fact that no longer the total alphabet is selectable for entering the next letter, but only a part of the alphabet. Accordingly, as a second letter there can only be selected by means of the multifunction button a letter belonging to this part of the alphabet. With each entry of a letter, the part of the alphabet that can be selected is reduced in most cases; in exceptional cases such a part may also remain unchanged after a letter has been entered. For the case where a certain entered letter sequence can only be followed by a certain letter or a certain letter sequence, the entry of these letters is no longer necessary for the user, because the navigation system automatically assumes this (these) letter(s) as if it was (they were) entered by the user. The entry mode leads to a faster entry of spelled place names, which is also more comfortable to the user.
SUMMARY OF THE INVENTION
It is an object of the invention to improve the method defined in the opening paragraph for automatic recognition of a spelled speech utterance so that, in addition to a more convenient entry, also a reduced speech recognition error rate is achieved.
The object is achieved in that
after the at least partly spelled speech utterance has been entered, the speech recognition unit (
2
) determines a first recognition result for the speech utterance;
individually recognized letters are sent to the user for him to acknowledge or reject;
after a letter has been acknowledged, the linguistic speech model (
6
) is adapted, which linguistic speech model, after its adaptation, determines the number of letters that can be allowed as followers of the acknowledged letter and assumes the correctness of letters already acknowledged;
with the adapted linguistic speech model the speech recognition unit determines a further recognition result for the speech utterance, from which result the next letter to be sent to the user is determined, so that he can acknowledge it.
By means of the processing steps, in which the user is requested to acknowledge or reject recognized letters, the system receives a feedback relating to the correctness of the recognition result achieved thus far relative to the speech utterance to be recognized. The speech utterance to be recognized may be a single word or a word sequence, while the entry processed according to the invented method is the whole speech utterance spelled out or partly spelled out. The successive feedback is used for the step-by-step improvement of the statistic modeling used in the speech recognizer by a reduction of the search space. This leads to the fact that with each improvement the probability diminishes that a wrong letter is sent to the user to be acknowledged, which in its turn reduces the required time until the final recognition of the spelled speech utterance. The method thus enhances the convenience to the user. The acoustic models used in the speech recognition unit, which models were estimated on the basis of the spelled part of the speech utterance, need not be adapted according to the invented recognition procedure. Only the linguistic model used each time depends on the just processed position in the speech utterance.
For reducing the search space during the speech recognition, linguistic speech models are normally used. On the one hand, this reduces the computational expenditure for controlling the speech recognition unit and, on the other hand, this also brings in an improvement of the recognition results. However, there is the problem that a long linguistic speech model leads to too large acoustic search spaces. The processing of such a speech model requires very much memory capacity and cannot at present be realized or is inefficient with customary signal processors used for speech recognition applications. Thanks to the invention, the complexity of the linguistic speech model, on the other hand, is minimized. The speech model is successively adapted in dependence on the user's acknowledgements of letters. Already acknowledged letter sequences are then presupposed as fixed. Only for the letters acknowledged last is there determined with the aid of the linguistic speech model which letters are selectable as following letters. Such a speech model is highly uncomplicated and can easily be converted into the speech recognition procedures used by means of customary signal processors with little calculation effort and memory capacity.
For the case where the user rejects a recognized letter, preferably two alternatives for a further processing are considered. On the one hand, the speech recognition unit can perform a renewed recognition operation with respect to the whole speech utterance after the linguistic speech model has been adapted including this information. The probability that the user is given the correct letter as a next proposed letter is increased considerably. On the other hand, there is also the possibility that the speech recognition unit determines a list N of best recognition alternatives as a recognition result for the speech utterance and that, after the user has rejected a recognized letter, the user is given the respective letter of the second-best solution alternative. This has the advantage that, after the user has rejected a letter sent to him as a recognition proposal, the speech recognition unit need not again perform the speech recognition procedures with respect to the (complete) spelled speech utterance, which achieves that after a rejection of a produced letter the user is given a further letter alternative with a minimum time delay.
If individual position-specific probability values particularly depending on all the previous letters are assigned to separate letters, which fact can be converted as a specification of the linguistic speech model used, the probability is enhanced that already a first proposal for a letter standing at a specific position of the speech utterance is correct and is acknowledged by the user. Here is used to advantage that certain letter combinations occur more often than other letter combinations.
In another embodiment of the invention the degree of exchangeability with other letters expressed by the probability value is taken into account when an alternative to a letter rejected by
Stahl Volker
Wuppermann Friedhelm
Azad Abul K.
{haeck over (S)}mits Talivaldis Ivars
LandOfFree
Method of automatic recognition of a spelled speech utterance does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method of automatic recognition of a spelled speech utterance, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method of automatic recognition of a spelled speech utterance will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3202984