Apparatus for converting speech to text

Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S231000

Reexamination Certificate

active

06366882

ABSTRACT:

FIELD OF THE INVENTION
The present invention relates to apparatus and methods for speech to text conversion using automatic speech recognition, and has various aspects.
BACKGROUND OF THE INVENTION
Automatic speech recognition, as such, is known from, for example, “Automatic Speech Recognition” by Kai-Fu Lee, Kluwer Academic Publishers 1989.
Conventional known systems for converting speech to text involving automatic speech recognition are desktop stand alone systems, in which each user needs his or her own system. Such known speech to text conversion systems have been produced by such companies as International Business Machines, Kurzweil Applied Intelligence Inc and Dragon Systems.
When performing automatic speech recognition, adaptation is known to improve system performance. Adaptation is a mathematical process where descriptive models are fine-tuned. In particular, speaker adaptation adapts models to better fit the speech characteristics, and language adaption adapts to word usage of the speaker.
When performing ASR adaptation the performance is judged by the accuracy of the resulting text and the time required to produce it. Improving performance is primarily related to improving accuracy, though improvement is also achieved when the required computation time is reduced.
Known systems for Automatic Speech Recognition (ASR) model the acoustical patterns of speech and the word patterns of the language used. Although speech recognition is performed using both speech and language models within a statistical framework, the two are constructed independently.
Acoustical modelling captures the nature of different sounds. A word can be described, via a pronouncing dictionary, as some combination of these sounds.
Language modelling captures the likelihood that a given word occurs in some context. It is necessary, in practice, to compile statistic likelihoods from large amounts of data collected over time.
Language models are adapted by applying millions of words and would therefore not be of benefit for a long time indeed from occasional or regular usage of dictation by an individual.
Known ASR systems use pattern matching and other known techniques:
(1) to match acoustic speech patterns with sub-wound units (typically phoneme related),
(2) to associate sub-word vectors with orthographic words (using a pronouncing dictionary),
(3) to represent and exploit the likelihood that a particular word will occur given its location relative to other surrounding words,
(4) to search to find the best text sequence by examining all possible word sequences and selecting the one which best concords the given acoustic utterance and the knowledge expressed in (1), (2) and (3) above.
Known ASR systems decode an acoustic pattern into a word sequence by appropriate use of this information. To adapt the recognition system requires both acoustic (sub-word parameter, as in (1)) and language (word using statistic, as in (3)) adaptation. A pronouncing dictionary, as in (2) is usually static except that new words, ie. those encountered in real use but absent from the system dictionary, must be added to it.
Known speech recognition technology is based on sub-word modelling. This requires each word to have a known pronunciation. Given that pronunciation, any word can be assimilated into a recognition system. In practice, words will occur for which no pronunciation is known in advance. So-called “Text-To-Speech” technology exists to invent a plausible pronunciation. However these are complicated and can be inaccurate, involving considerable hand-crafting effort.
The correct transcription of an audio recording to be used for adapting the ASR system is a word-for-word verbatim text transcript of the content of that speech recording.
The transcripts returned from audio typists may not match the word-for-word speech. For example, embedded instructions may have been interpreted (eg. delete that sentence), information inserted (eg. insert date), stylisation applied (date and number format), or obvious mistakes corrected (eg. “U.S. President Abram Lincoln” might be manually corrected to “Abraham Lincoln”). When applied in ASR adaptation these variations between the speech and corrected text can cause errors.
Known ASR use speaker independent acoustic modelling. The models can be adapted through usage to improve the performance for a given speaker. Speaker dependent models are unpopular, because they require a user to invest time (usually one or two hours) before be or she can use the ASR system.
BRIEF SUMMARY OF THE INVENTION
In a first aspect, the present invention relates to a speech to text convertor comprising a plurality of user terminals for recording speeches, at least one automatic speech recognition processor, and communication means operative to return the resulting texts to the respective user, in which at least one automatic speech recognition processor is adapted to improve recognition accuracy using data of the recorded speeches and the resulting texts, the data being selected dependent upon subject matter area.
This advantageously provides subject-matter area specific adaptation whereby data from previous user'S in a subject matter area is used to improve performance of automatic speech recognition processors for subsequent users in that subject matter area.
New users benefit from previous adaptation using data according to their subject matter area. Both occasional and regular users benefit from adaptation using data from others in their subject matter area.
Data for adaptation is preferably accumulated by pooling according to subject matter area prior to adaptation. In particular, given, say hundreds or thousands of users over time but a much fewer number of subject matter areas, (say five or ten), data for adaptation is quickly accumulated by pooling according to subject matter area.
The subject matter areas can be various disciplines, such as legal, medical, electrical, accounting, financial, scientific and chemical subject matter areas; also personal correspondence and general business.
Preferably language models are adapted dependent on which subject matter area they are used for using data from that subject matter area. New words which occur in a subject matter area are acquired by a language model for each new word being provided, and subsequently adapted. The probabilities of word occurrences dependent on subject matter area are learnt and used for improved automatic speech recognition accuracy.
Preferably, each recorded speech has an indicator of subject matter area and the selection of data for adaptation is dependent upon the indicator. This indicator can be provided by the user or determined and applied subsequently.
Preferably, the data for adaptation can be selected dependent not only on subject matter area but also on the user's accent grouping. This can further improve accuracy of automatic speech recognition.
In a second aspect, the present invention relates to a speech to text convertor comprising a plurality of user terminals for recording speeches, at least one automatic speech recognition processor, and communication means operative to return the resulting texts to the respective user, in which at least one automatic speech recognition processor is adapted to improve recognition accuracy using data of the recorded speeches and the resulting texts, the data being selected dependent upon accent group.
Accent group specific adaptation advantageously enables data from previous user's in an accent group to be used to improve performance of automatic speech recognition processors for subsequent users belonging to the same accent group. In particular, as a result of previous adaptation, acoustic models are closer to the new user's speech giving improved performance.
Data for adaptation is preferably accumulated by pooling according to accent group prior to adaptation.
The accent groups can refer to country, region and/or city, eg. United Kingdom, United States or any other specific accents or sub-accents.
Preferably acoustic models are adapted dependent on which acce

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Apparatus for converting speech to text does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Apparatus for converting speech to text, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Apparatus for converting speech to text will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2927177

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.