Split-vector quantization for speech signal involving...

Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S243000

Reexamination Certificate

active

06253173

ABSTRACT:

FIELD OF THE INVENTION
This invention relates to a method and an apparatus for automatically performing desired actions in response to spoken requests. It is applicable to speech recognition systems, specifically to speech recognition systems using feature vectors to represent speech utterances, and can be used to reduce the storage space required for the speech recognition dictionary and to speed-up the upload/download operation required in such systems.
BACKGROUND OF THE INVENTION
In addition to providing printed telephone directories, telephone companies provide information services to their subscribers The services may include stock quotes, directory assistance and many others. In most of these applications, when the information requested can be expressed as a number or number sequence, the user is required to enter his request via a touch tone telephone. This is often aggravating for the user since he is usually obliged to make repetitive entries in order to obtain a single answer. This situation becomes even more difficult when the input information is a word or phrase. In these situations, the involvement of a human operator may be required to complete the desired task.
Because telephone companies are likely to handle a very large number of calls per year, the associated labour costs are very significant Consequently, telephone companies and telephone equipment manufacturers have devoted considerable efforts to the development of systems which reduce the labour costs associated with providing information services on the telephone network These efforts comprise the development of sophisticated speech processing and recognition systems that can be used in the context of telephone networks.
In a typical speech recognition system the user enters his request using isolated word, connected word or continuous speech via a microphone or telephone set. The request may be a name, a city or any other type of information for which either a function is to be performed or information is to be supplied. If valid speech is detected, the speech recognition layer of the system is invoked in an attempt to recognize the unknown utterance. The speech recognition process can be split into two steps namely a pre-processing step and a search step The pre-processing step, also called the acoustic processor, performs the segmentation, the normalisation and the parameterisation of the input signal waveform. Its purpose is traditionally to transform the incoming utterance into a form that facilitates speech recognition. Typically feature vectors are generated at this step. Feature vectors are used to identify speech characteristics such as formant frequencies, fricative, silence, voicing and so on. Therefore, these feature vectors can b used to identify a spoken utterance The second step in the speech recognition process, the search step, includes a speech recognition dictionary that is scored in order to find possible matches to the spoken utterance based on the feature vectors generated in the pre-processing step. The search may be done in several steps in order to maximise the probability of obtaining the correct result in the shortest possible time and most preferably in real-time. Typically, in a first pass search, a fast match algorithm is used to select the top N orthographies from a speech recognition dictionary. In a second pass search the individual orthographies are re-scored using more precise likelihood calculations. The top two orthographies in the re-scored group are then processed by a rejection algorithm that evaluates if they are sufficiently distinctive from one another so that the top choice candidate can be considered to be a valid recognition.
Voiced activated dialling (VAD) systems are often based on speaker trained technology. This allows the user of the service to enter by voice a series of names for which he wishes to use VAD. Each of the names is associated with a phone number that is dialled when the user utters the name. The names and phone number are stored in a “client dictionary” situated in the central repository of the VAD system. Each subscriber of the service has an associated client dictionary. Since the number of subscribers is substantial and the number of entries in each client dictionary can be quite large, the storage requirements for the central repository are very high. Furthermore, each user request requires his respective client dictionary to be downloaded to a temporary storage location in the speech recognition unit, which puts a further load on the system. Compression/Decompression techniques are required to allow the system to support such a load. However prior art techniques that have high compression factors are either not real-time or degrade significantly the performance of the speech recogniser in terms of recognition accuracy of the speech recognition system
Thus, there exists a need in the industry to provide a real-time compression/decompression method such as to minimize the storage requirement of a speech recognition dictionary while maintaining a high recognition accuracy
OBJECTS AND STATEMENT OF THE INVENTION
An object of the invention is to provide a method and apparatus for performing compression and/or decompression of an audio signal that offers real time performance, particularly well suited in the field of speech recognition.
Another object of this invention is to provide a method and apparatus for adding entries to a speech recognition client dictionary, particularly well suited for use in training a voice activated dialing system
As embodied and broadly described herein the invention provides an apparatus for compressing an audio signal, said apparatus comprising:
means for receiving an audio signal;
means for processing said audio signal to generate at least one feature vector, said feature vector including a plurality of elements;
means for grouping elements of said feature vector into a plurality of sub-vectors; and
means for quantizing said plurality of sub-vectors.
For the purpose of this specification the expressions “feature vector” is a data element that can be used to describe the characteristics of a frame of speech. The elements of the feature vectors are parameters describing different components of the speech signal such as formants, energy, voiced speech and so on. Examples of parameters are LPC parameters and mel-based cepstral coefficients.
For the purpose of this specification the expression “speech recognition client dictionary” designates a data structure containing orthographies that can be mapped onto a spoken utterance on the basis of acoustic characteristics and, optionally, a-priori probabilities or another rule, such as a linguistic or grammar model. Each of these data structures is associated to a user of a speech recognition system.
For the purpose of this specification, the expressions “orthography” is a data element that can be mapped onto a spoken utterance that can form a single word or a combination of words
For the purpose of this specification, the epressions “quantizing”, “quantize” and “quantization” are used to designate the process of approximating a value by another in order to reduce the memory space required in the representation of the latter. Devices designated herein as a “quantizor” perform this process.
In a most preferred embodiment of this invention the compression apparatus is integrated into a speech recognition system, such as a voice activated dialing system, of the type one could use in a telephone network, that enables users to add names to a directory in order to train the system. Preferably, each user has his directory, herein referred to as client dictionary, where he may enter a plurality of entries. In a typical training interaction, once the voice activated system receives a is request from the user, it will first download from a central database, herein referred to as database repository, the client dictionary associated with the user As a second step the system will issue a prompt over the telephone network requesting the user to specify a name he wishes add. If vali

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Split-vector quantization for speech signal involving... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Split-vector quantization for speech signal involving..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Split-vector quantization for speech signal involving... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2473817

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.