Method of speech recognition

Data processing: speech signal processing – linguistics – language – Speech signal processing – Application

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S251000, C709S229000

Reexamination Certificate

active

06757655

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The invention relates to a method in which an information unit enabling a speech input is stored on a server and can be retrieved by a client and in which the client can be coupled to a speech recognizer through a communications network.
2. Description of the Related Art
The possibility of carrying out the communication with a computer by speech input instead of keyboard or mouse, unburdens the user in his work with computers and often increases the speed of input. Speech recognition can be used in many fields in which nowadays input is effected by means of a keyboard. Obviously, the issues may be of a most varied nature. On the other hand, during the speech recognition strict requirements are made on the computational power, which is often not sufficiently available on local computers (clients). Particularly for speaker-independent speech recognition with a large vocabulary, the computational power of the clients is often insufficient. In order to make a reliable and fast speech recognition of speech inputs possible, it is advisable to carry out the speech recognition on a specialized speech recognizer which is run on a powerful computer.
EP 0 872 827 describes a system and a method of speech recognition. A client on which compressed software for speech recognition is executed is connected to a speech recognition server through a network. The client sends a speech recognition grammar and the data of the speech input to the speech recognition server. The speech recognition server executes the speech recognition and returns the recognition result to the client.
A disadvantage in client/server speech recognition systems described in the opening paragraph is that HTML pages (Hyper-Text Markup-Language) are accessed simultaneously by various users and the speech recognizers are fully loaded by the resultant various speech inputs, so that the speech recognition requires an unacceptable processing time.
SUMMARY OF THE INVENTION
Therefore, it is an object of the invention to ensure an acceptable processing time with a high recognition quality for the recognition of a speech input.
This object is achieved in that the client can be coupled to a plurality of speech recognizers and additional information is assigned to the information unit, which additional information is used for determining a combination of a client with at least one of the speech recognizers for recognizing a speech signal that has been entered.
A client downloads an information unit from a server connected through the communications network, for example, the Internet. This information unit is stored on the server and offers a user the possibility of speech input. A server is a computer in a communications network, for example, the Internet, on which information is stored from providers that can be retrieved by clients. A client is a computer which is connected to a server for retrieving information from the Internet and downloads the information unit stored on the server to represent the information unit by means of software. Since the client has limited computation power, the speech recognition is not effected on the client, but on a speech recognizer which is connected to the client through the communications network. For combining the client with a specialized speech recognizer, the server assigns additional information to the information unit stored on the server. This additional information is combined with the information unit and is co-transferred to the client during the downloading. With the aid of the additional information, the information unit is assigned a speech recognizer specially attuned to this downloaded information unit, which speech recognizer then executes the speech recognition.
The additional information is issued by the server in accordance with a predefined criterion such as, for example, theme area, type of speech recognizer or full utilization of the speech recognizers. As a result, a special speech recognizer is selected for each downloaded information unit, which performs the speech recognition of the speech input with a high quality and short processing time.
This has the advantage that the provider of the information unit, who knows the vocabulary to be expected, selects a speech recognizer and combines this speech recognizer with this information unit. The quality of the recognition of the speech input can be considerably increased by means of a provider-controlled assignment of a speech recognizer, because always similar speech inputs can be expected with regard to the respective information unit stored on the server by the provider. With regard to speech recognizers determined by the user, these speech recognizers are to recognize speech entries from a very wide area of application. With this fixed coupling of a speech recognizer to, for example, the Web browser, the speech recognizer is not sufficiently specialized for the wide range of areas of application, so that with this fixed coupling the quality of the recognition result is influenced in a negative way.
The additional information preferably contains the address of the special speech recognizer in the communications network. Furthermore, the additional information contains optional indications about the employment of the recognition result. In the most simple case, the recognition result is returned to the client and produced there as text or speech. Besides, this additional information contains optional indications in which the type of speech recognizer to be used is accurately specified. The additional information can furthermore contain, for example, the vocabulary or parameters to adapt the speech recognizer to the speech input and carry out an adaptation to this speech recognizer. The optional transfer of further parameters improves the speed and/or quality of the speech recognition.
In an advantageous embodiment of the invention there is provided to have the address of a distributor indicated in the additional information. This distributor controls a plurality of speech recognizers. Belonging thereto are, for example, a plurality of speech recognizers of the same type, or groups of speech recognizers which are provided only for recognizing simple speech utterances, such as digits or “Yes/No”. The distributor assigned by means of the additional information assigns the speech signals coming from a plurality of clients to the speech recognizers available to them. As a result, not only is there ensured a faster processing of the speech inputs, but also a uniform full load of the speech recognizers.
As a further embodiment of the invention, there is proposed that the clients download the information units in the form of HTML pages from a server. These HTML pages are shown by means of a Web browser on the client or by means of another application suitable for displaying them. The information units could also be realized as Web pages. For downloading this HTML page, the client sets up a connection to the server on which this HTML page is stored. During the downloading, the data are transmitted to the client in the form of the HTML code. This HTML code contains the additional information which is realized, for example, as an HTML tag. This downloaded HTML page is shown by the Web browser and the user can input speech. The co-transmitted HTML tag defines the speech recognizer provided for recognizing the speech input. For the recognition of a speech input, the client sets up a connection to the speech recognizer through a communications network. The speech input is transmitted to the speech recognizer, recognized there and the result of the recognition is returned, for example, to the client.
In an advantageous embodiment of the invention, when a plurality of clients access an HTML page, an individual HTML tag is assigned to each individual client. For this purpose, the server assigns different addresses of speech recognizers to the HTML tags when a plurality of clients access the respective HTML page. This achieves that when there are many accesses to an HTML page, a plural

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method of speech recognition does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method of speech recognition, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method of speech recognition will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3354402

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.