Method of transmitting voice data

Data processing: speech signal processing – linguistics – language – Speech signal processing – Synthesis

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S254000, C704S249000

Reexamination Certificate

active

06304845

ABSTRACT:

BACKGROUND OF THE INVENTION
Field of the Invention
The invention lies in the communications field. More specifically, the present invention relates to a method for transmitting voice data wherein the voice data are compressed before transmission and decompressed at the transmission destination. The compression is thereby based on a decomposition of the voice data into phonemes. Phonemes are the acoustic language elements which are essential for the perception of spoken language.
It has been known in the art to compress voice data before transmission in a communications network in order to occupy as little transmission bandwidth as possible in the communications network. In these cases, when the voice is reproduced at the transmission destination the compressed voice data are returned to their original state, or to an equivalent state, by decompression. Because the reduction in the transmission band width which can be achieved by such a method depends directly on the compression rate of the compression method used, it is desirable to try to achieve the highest possible compression rate.
During voice transmission, the methods used for the compression are usually prediction methods which utilize the statistical unequal distribution of the data patterns occurring in voice data in order to reduce a high level of redundancy which is inherent in voice data. During the decompression process, the original voice data can be reconstructed from the compressed voice data virtually without falsification with the exception of small losses which are inherent in the process. The compression ratio which can thereby be achieved lies in the order or magnitude of approximately 1:10. Methods of that type are described, for example, by Richard W. Hamming in “Information und Codierung” [Information and Coding]”, VCH Verlagsgesellschaft Weinheim, 1987, pages 81-97.
In typical voice data, information relating purely to the content forms only a small fraction of the entire voice information. The greatest part of the voice information comprises, as a rule, speaker-specific information which is expressed, for example, in nuances of the speaker's voice or the register of the speaker. When voice data are transmitted, essentially only the information relating to their content is significant, for example in the case of purely informative messages, automatic announcements or the like. For this reason it is possible, by reducing the speaker-specific information, also to achieve significantly higher compression rates than with methods which completely or virtually completely preserve the information payload of the voice data.
The smallest acoustic units in which language is formulated by the speaker and in which the information relating to the content—the spoken words—is also expressed are phonemes. U.S. Pat. No. 4,424,415 (see European patent EP 71716 B1), German patent DE 3513243 C2, and European patent EP 423800 B1 have heretofore disclosed arrangements and methods in which a stream of voice data is analyzed for the phonemes contained in it and converted into a stream of code symbols which are respectively assigned to the phonemes detected, in order to compress the voice data before transmission.
A significant problem here is the reliable detection of the phonemes from which any stream of voice data which are to be transmitted is composed. This is made difficult in particular as a result of the fact that the same phoneme can be realized very differently depending on the speaker and the speaker's linguistic habits. If phonemes are not detected within the stream of voice data or assigned to incorrect sounds, the transmission quality of the language is impaired—possibly to the point of incomprehensibility. Reliable phoneme analysis is therefore an important criterion for the quality and/or the range of application of such voice transmission methods.
SUMMARY OF THE INVENTION
The object of the invention is to provide a voice data transmission method which overcomes the above-noted deficiencies and disadvantages of the prior art devices and methods of this kind, and which presents a flexible and efficient method in which voice data can be compressed by means of an improved phoneme analysis before transmission.
With the above and other objects in view there is provided, in accordance with the invention, a method of transmitting voice data from a voice data source to a transmission destination. The method comprises the following steps:
selecting a specific phoneme catalog assigned to a given subscriber in dependence on an identifier of the subscriber transmitting voice data, the phoneme catalog having stored therein phonemes corresponding to given voice data patterns, and each phoneme being respectively assigned an unambiguous code symbol;
feeding the voice data to be transmitted to a neural network trained to detect phonemes stored in the specific phoneme catalog and analyzing the voice data for the phonemes contained therein with the neural network;
for the phonemes detected in the voice data, determining the code symbols respectively assigned to the phonemes in the selected phoneme catalog;
transmitting the code symbols to a voice synthesizer at a transmission destination;
converting the stream of received code symbols with the voice synthesizer into a sequence of phonemes respectively assigned to the code symbols in a phoneme catalog; and
outputting the sequence of phonemes.
In accordance with an alternative embodiment of the invention, there is provided a method of transmitting voice data from a voice data source to a transmission destination using phoneme catalogs in which voice data patterns corresponding to phonemes are stored, and each phoneme is respectively assigned an unambiguous code symbol, the method which comprises the following steps:
feeding voice data to be transmitted to a neural network trained to detect voice data variations selected from the group consisting of various languages and various speakers and to detect one of a language to which the voice data to be transmitted belong and a speaker from which the voice data to be transmitted originate, and causing with the neural network a given phoneme catalog assigned to the voice data variation to be selected;
feeding the voice data to a neural network trained to detect the phonemes stored in the phoneme catalog, to analyze the voice data for the phonemes contained therein, and being trained to detect the voice data to be transmitted;
determining, for the phonemes detected in the voice data, a code symbol respectively assigned to the phonemes in the selected phoneme catalog;
transmitting the code symbols to a voice synthesizer at a transmission destination;
converting a stream of received code symbols with the voice synthesizer into a sequence of phonemes respectively assigned to the code symbols in a phoneme catalog; and outputting the sequence of phonemes.
In other words, voice data which are to be transmitted from a voice data source to a transmission destination are subjected to a phoneme analysis before the actual transmission. In order to apply the method, the voice data may be present in a wide variety of forms; for example in analog or digitized form or as feature vectors describing voice signals, in each case in representations which are resolved in terms of time and/or frequency. The phoneme analysis is carried out according to the invention by means of a neural network which is trained to detect phonemes. The principles of a detection of voice and/or phonemes by means of neural networks are described, for example, in “Review of Neural Networks for Speech Recognition” by R. P. Lippmann in Neural Computation 1, 1989, pages 1-38.
The phonemes according to which the stream of voice data is to be analyzed and with respect to which the neural network is trained are stored in voice-specific and/or speaker-specific phoneme catalogs in which an unambiguous code symbol, for example an index or a number, is respectively assigned to them. Language can be understood in this context to be, inter alia, natural languages, regional

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method of transmitting voice data does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method of transmitting voice data, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method of transmitting voice data will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2581644

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.