Data processing: speech signal processing – linguistics – language – Linguistics – Translation machine
Reexamination Certificate
1999-01-29
2001-08-21
Thomas, Joseph (Department: 2644)
Data processing: speech signal processing, linguistics, language
Linguistics
Translation machine
C704S009000, C704S257000, C704S277000, C345S171000
Reexamination Certificate
active
06278968
ABSTRACT:
FIELD OF THE INVENTION
This invention relates to speech or voice translation systems. More particularly, this invention relates to a spoken language translation system that performs speech-to-speech translation.
BACKGROUND
Speech is the predominant mode of human communication because it is very efficient and convenient. Certainly, written language is very important, and much of the knowledge that is passed from generation to generation is in written form, but speech is a preferred mode for everyday interaction. Consequently, spoken language is typically the most natural, most efficient, and most expressive means of communicating information, intentions, and wishes. Speakers of different languages, however, face a formidable problem in that they cannot effectively communicate in the face of their language barrier. This poses a real problem in today's world because of the ease and frequency of travel between countries. Furthermore, the global economy brings together business people of all nationalities in the execution of multinational business dealings, a forum requiring efficient and accurate communication. As a result, a need has developed for a machine-aided interpersonal communication system that accepts natural fluent speech input one language and provides an accurate near real-time output comprising natural fluent speech in another language. This system would relieve users of the need to possess specialized linguistic or translational knowledge. Furthermore, there is a need for the machine-aided interpersonal communication system to be portable so that the user can easily transport it.
A typical language translation system functions by using natural language processing. Natural language processing is generally concerned with the attempt to recognize a large pattern or sentence by decomposing it into small subpatterns according to linguistic rules. Until recently, however, natural language processing systems have not been accurate or fast enough to support useful applications in the field of language translation, particularly in the field of spoken language translation.
While the same basic techniques for parsing, semantic interpretation, and contextual interpretation may be used for spoken or written language, there are some significant differences that affect system design. For instance, with spoken input the system has to deal with uncertainty. In written language the system knows exactly what words are to be processed. With spoken language it only has a guess at what was said. In addition, spoken language is structurally quite different than written language. In fact, sometimes a transcript of perfectly understandable speech is not comprehensible when read. Spoken language occurs a phrase at a time, and contains considerable intonational information that is not captured in written form. It also contains many repairs, in which the speaker corrects or rephrases something that was just said. In addition, spoken dialogue has a rich interaction of acknowledgment and confirmation that maintains the conversation, which does not appear in written forms.
The basic architecture of a typical spoken language translation or natural language processing system processes sounds produced by a speaker by converting them into digital form using an analog-to-digital converter. This signal is then processed to extract various features, such as the intensity of sound at different frequencies and the change in intensity over time. These features serve as the input to a speech recognition system, which generally uses Hidden Markov Model (HMM) techniques to identify the most likely sequence of words that could have produced the speech signal. The speech recognizer then outputs the most likely sequence of words to serve as input to a natural language processing system. When the natural language processing system needs to generate an utterance, it passes a sentence to a module that translates the words into phonemic sequence and determines an intonational contour, and then passes this information on to a speech synthesis system, which produces the spoken output.
A natural language processing system uses considerable knowledge about the structure of the language, including what the words are, how words combine to form sentences, what the words mean, and how word meanings contribute to sentence meanings. However, linguistic behavior cannot be completely accounted for without also taking into account another aspect of what makes humans intelligent—their general world knowledge and their reasoning abilities. For example, to answer questions or to participate in a conversation, a person not only must have knowledge about the structure of the language being used, but also must know about the world in general and the conversational setting in particular.
The different forms of knowledge relevant for natural language processing comprise phonetic and phonological knowledge, morphological knowledge, syntactic knowledge, semantic knowledge, and pragmatic knowledge. Phonetic and phonological knowledge concerns how words are related to the sounds that realize them. Such knowledge is crucial for speech based systems. Morphological knowledge concerns how words are constructed from more basic units called morphemes. A morpheme is the primitive unit in a language, for example, the word friendly is derivable from the meaning of the noun friend and the suffix—ly, which transforms a noun into an adjective.
Syntactic knowledge concerns how words can be put together to form correct sentences and determines what structural role each word plays in the sentence and what phrases are subparts of what other phrases. Typical syntactic representations of language are based on the notion of context-free grammars, which represent sentence structure in terms of what phrases are subparts of other phrases. This syntactic information is often presented in a tree form.
Semantic knowledge concerns what words mean and how these meanings combine in sentences to form sentence meanings. This is the study of context-independent meaning—the meaning a sentence has regardless of the context in which it is used. The representation of the context-independent meaning of a sentence is called its logical form. The logical form encodes possible word senses and identifies the semantic relationships between the words and phrases.
Natural language processing systems further comprise interpretation processes that map from one representation to the other. For instance, the process that maps a sentence to its syntactic structure and logical form is called parsing, and it is performed by a component called a parser. The parser uses knowledge about word and word meaning, the lexicon, and a set of rules defining the legal structures, the grammar, in order to assign a syntactic structure and a logical form to an input sentence. Formally, a context-free grammar of a language is a four-tuple comprising nonterminal vocabularies, terminal vocabularies, a finite set of production rules, and a starting symbol for all productions. The nonterminal and terminal vocabularies are disjoint. The set of terminal symbols is called the vocabulary of the language. Pragmatic knowledge concerns how sentences are used in different situations and how use affects the interpretation of the sentence.
The typical natural language processor, however, has realized only limited success because these processors operate only within a narrow framework. A natural language processor receives an input sentence, lexically separates the words in the sentence, syntactically determines the types of words, semantically understands the words, pragmatically determines the type of response to generate, and generates the response. The natural language processor employs many types of knowledge and stores different types of knowledge in different knowledge structures that separate the knowledge into organized types. A typical natural language processor also uses very complex capabilities. The knowledge and capabilities of the typical natural language processor must be reduced in complexity and
Franz Alexander M.
Horiguchi Keiko
Blakely , Sokoloff, Taylor & Zafman LLP
Sony Corporation
Thomas Joseph
LandOfFree
Method and apparatus for adaptive speech recognition... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and apparatus for adaptive speech recognition..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for adaptive speech recognition... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2502933