Speech recognition with sequence parsing, rejection and pause de

Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition

Patent

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

704254, G10L 506, G10L 900

Patent

active

058483888

DESCRIPTION:

BRIEF SUMMARY
BACKGROUND OF THE INVENTION

1. Field of the Invention
This invention relates to methods and apparatus for speech recognition. Speech recognition is used as an input means for control of machines. At present, speech recognition apparatus generally recognises isolated single words. Speech recognition apparatus is also being developed which is intended to recognise multiple words spoken consecutively in a sentence or phrase; this is referred to as connected speech recognition.
2. Related Art
In speech recognition, a microphone picks up a speech signal from a speaker which is then digitised and processed for recognition. However, the microphone generally also picks up any background or ambient noise and the electrical system between the microphone and the speech recognition apparatus will likewise add noise (e.g. thermal noise, quantising noise and--where the speech is transmitted through a telecommunications channel--line noise). The noise may resemble parts of speech, for example unvoiced sibilant sounds. Accordingly, the correct recognition of a word depends strongly on the ability to distinguish the beginning and the end of the word, which correspond to the end and beginning of noise or silence. The reliability of speech recognition has been shown to depend strongly on the identification of the correct start and end points for speech.
One speech processing method which is intended to allow the recognition of a sequence of words using isolated word recognition technology as the "connected-for-isolated" (CFI) technique, described in our co-pending EP patent application 93302538.9 and incorporated herein by reference (corresponding to U.S. patent application Ser. No. 08/530,157 filed Sep. 29, 1995 as the US National Phase of PCT/BG94/00704 filed Mar. 31, 1994. This technique assumes that the signal from the microphone will include alternating periods of speech and noise, and attempts to recognise alternately speech and noise.
A common approach in speech recognition is to use statistical processing, making no initial assumptions about the mechanisms by which speech is produced. For example, hidden Markov modeling (HMM) techniques are used (as described in British Telecom Technology Journal, April 1988, vol 6 Number 2 page 105, Cox). In HMM recognition, each incoming frame of speech is compared with a number of states, to determine the likelihood of the speech frame corresponding to each of those states, and the state probabilities thus generated are compared with a number of predetermined models comprising state sequences corresponding to different words to be recognised. Whilst a word is being recognised, a number of different state sequences, and hence a number of different words, are simultaneously possible; the final determination of which state sequence was observed is made by selecting the most likely state sequence when the whole utterance is received.
Some types of HMM speech recognition maintain, during recognition, a number of possible state sequences, including a current most probable sequence for defining the word which has been recognised.
In such sequential recognisers, since the decision as to the identity of the selected word is based on the sequences of states generated, the decision cannot be made until the sequence is complete. The most likely state sequence can be recalculated for each received frame, so that as soon as the end of a word can unambiguously be identified, recognition is performed by simply outputting a recognition signal corresponding to the current most likely state sequence. The recognition process will itself produce start and end points, but this is done in conjunction with the selection of the word which is recognised and not as a separate, preliminary, end pointing step.
A CFI recogniser is therefore able to automatically locate the start and end of a word, by maintaining state sequences corresponding to noise, and recognising the sequence of noise-word-noise in the speech signal. However, many words may include gaps or stops between parts of the word, which might be m

REFERENCES:
patent: Re31188 (1983-03-01), Pirz et al.
patent: 4348553 (1982-09-01), Baker et al.
patent: 4481593 (1984-11-01), Bahler
patent: 4761815 (1988-08-01), Hitchcock
patent: 4783804 (1988-11-01), Juang et al.
patent: 4803729 (1989-02-01), Baker
patent: 4809331 (1989-02-01), Holmes
patent: 4829578 (1989-05-01), Roberts
patent: 4837831 (1989-06-01), Gillick et al.
patent: 4888823 (1989-12-01), Nitta et al.
patent: 4908918 (1990-03-01), Bahl et al.
patent: 4989248 (1991-01-01), Schalk et al.
patent: 5040127 (1991-08-01), Gerson
patent: 5228110 (1993-07-01), Steinbiss
patent: 5309547 (1994-05-01), Niyada et al.
patent: 5388183 (1995-02-01), Lynch
patent: 5390278 (1995-02-01), Gupta et al.
patent: 5524169 (1996-06-01), Cohen et al.
patent: 5583961 (1996-12-01), Pawlewski et al.
patent: 5621859 (1997-04-01), Schwartz et al.
Lamel et al, "An Improved Endpoint Detector for Isolated Word Recognition", IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 29, No. 4, Aug. 1981, New York, US, pp. 777-785.
IBM Technical Disclosure Bulletin, vol. 34, No. 9, Feb. 1992, New York, US, pp. 267-269, "Method of Endpoint Detection".
Austin et al, "A Unified Syntax Direction Mechanism for Automatic Speech Recognition Systems Using Hidden Markov Models", ICASSP, vol. 1, May 1989, Glasgow, pp. 667-670.
Young et al, "Token Pasing: A Simple Conceptual Model for Connected Speech Recognition Systems", Cambridge University Engineering Department, Jul. 31, 1989, pp. 1-23.
Kitano, An Experimental Speech-to-Speech Dialog Translation System, IEEE, Jun. 1991, DM-Dialog, Carnegie Mellon University and NEC Corporation.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Speech recognition with sequence parsing, rejection and pause de does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Speech recognition with sequence parsing, rejection and pause de, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Speech recognition with sequence parsing, rejection and pause de will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-190274

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.