Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission
Reexamination Certificate
2000-01-28
2004-01-06
McFadden, Susan (Department: 2655)
Data processing: speech signal processing, linguistics, language
Speech signal processing
For storage or transmission
C704S206000, C704S211000, C382S100000
Reexamination Certificate
active
06675140
ABSTRACT:
TITLE OF THE INVENTION
A Mellin-Transform Information Extractor for Vibration Sources.
BACKGROUND OF THE INVENTION
This application is based on the inventors' work, “A Mathematical Framework for Auditory Processing: A Mellin Transform of a Stabilized Wavelet Transform?” (Irino et al., ATR Technical Report, Jan. 29, 1999), the description of which is incorporated herein by reference in its entirety.
1. Field of the Invention
The present invention relates to an improvement of time sequential data analysis which has been conventionally conducted by the Fourier transform or statistical approach such as a self regression model. The present invention is applicable to tone recognition, individual recognition by voice, speech recognition, analysis of architectural acousticity as well as signal analysis, encoding, signal separation and signal enhancement processes of voice or music, for example. Besides acoustic signals and the like, the present invention is widely applicable to analysis of mechanical vibration such as mechanical sound and seismic waves, analysis of biotic signals such as brain waves, cardiac pulse sounds, ultrasonic echoes and nerve cell signals as well as analysis of signals from sensors for collecting general time sequential data.
2. Description of the Background Art
Conventionally, the fundamental step in information processing was to find the spectrogram, that is, a “time-frequency representation” of the signal. What is obtained using a fast digital transform (for example, a fast Fourier transform) or using linear predictive analysis, is always a vector which directly corresponds to a spectrum of a frequency representation at a certain time point, and a time sequence of such vectors constitutes a representation corresponding to a spectrogram. Such a representation derives from the spectral representation of signals originated from the Fourier transform. The sound spectrogram is the most popular representation for features of a voice signal, for example. The sound spectrogram is a visual representation of time change in the voice spectrum using a density representation, level contour representation or color representation for easier understanding.
Because this spectral representation is a better representation for signal features than the waveform, because the human auditory system is not very sensitive to relative phase relationships between signals consisting solely of a plurality of sine waves and because a method of efficient calculation of such relations has been established, the spectral representation was thought to be the most suitable for information processing of voice and the like, and therefore the spectral representation has come to be widely used.
Conventionally, the performance of various signal processing systems has been improved to the extreme by applying the spectral representation described above to almost anything. It seems, however, that the improvement in performance by this approach has almost reached the limit. In the field of speech recognition apparatus, for example, it is generally necessary to train the system on a number of human speakers in advance. However, even speech recognition apparatus which has already gone through the learning process with a large number of adult male and female speakers would not recognize the voice of a child. The basic reason for this is that vocal tracts, vocal cords and the like of an adult and a child are different in physical size, and therefore spectral structures and the pitch of the speech are different, and as a result, feature vectors extracted from the respective speakers are different.
As a solution of this problem, the speech recognition apparatus may be trained with the speech of a large number of children, or speech recognition apparatus designed especially for children may be prepared together with the apparatus for discriminating an adult from a child. At present, however, large scale data bases of children's speech are not available, and hence such speech recognition apparatus for children only cannot readily be constructed. Further, even if such a large scale data base of children's speech is built up taking much time and labor, the above described solution could not be very efficient.
In order to solve this problem, a representation is indispensable which is capable of automatically normalizing the physical size of a vocal tract or vocal cord, which is difficult using a spectrogram. Though an example of speech recognition only has been described, there are various and many situations which require acoustic feature extraction which is invariant regardless of the physical size of a sound source, for example, analysis of sounds from musical instruments and analysis of combustion engine sound. The solution to the problem is necessary in wide and various fields including analysis of not only acoustic signals but also mechanical vibration such as mechanical sounds and seismic waves, analysis of biotic signals such as brain waves, cardiac pulse sounds, ultrasonic echoes and nerve cell signals and analysis of signals from sensors for collecting general time sequential data.
SUMMARY OF THE INVENTION
Therefore, an object of the present invention is to provide a method of signal processing which can overcome the essential limit imposed by spectral representation described with reference to the examples above using a representation not dependent on physical size of the signal source, as well as to provide an apparatus utilizing the method.
Another object of the present invention is to provide a method of signal processing capable of extracting a signal feature invariant regardless of the physical size of the signal source using a representation not dependent on the physical size of the signal source, as well as to provide an apparatus using the method.
A still further object of the present invention is to provide a method of signal processing capable of extracting a signal feature invariant regardless of the physical size of a signal source using a representation of which shape is invariant regardless of expansion or reduction along a time axis of a signal waveform, as well as to provide an apparatus utilizing the method.
An additional object of the present invention is to provide a method of signal processing capable of extracting a signal feature invariant regardless of physical size of a signal source by obtaining and utilizing a representation of which shape is invariant regardless of expansion or reduction along a time axis of a signal waveform, using the Mellin transform, and to provide an apparatus utilizing the method.
A still further object of the present invention is to provide a method of signal processing capable of extracting a signal feature invariant regardless of physical size of a signal source by obtaining and utilizing a time expression of which shape is invariant regardless of expansion or reduction along a time axis of a signal waveform using the Mellin transform, overcoming the “shift varying” characteristic of the Mellin transform, as well as to provide an apparatus utilizing the method.
The method of signal processing in accordance with an aspect of the present invention includes the steps of: wavelet-transforming an input signal in a computer; and extracting features of the signal by performing a Mellin transform to the output of the wavelet transform step in synchrony with the input signal in a computer.
As the output of the wavelet transform step is synchronized with the input signal, a start point for the Mellin transform analysis is determined, and hence the Mellin transform of the input signal becomes possible despite the shift varying nature of the Mellin transform. The Mellin transform is characterized by the fact that the magnitude distribution of the output thereof is unchanged regardless of expansion or reduction of a signal waveform on the time axis. Therefore, the Mellin transform used in signal processing enables extraction of a feature invariant regardless of the physical size of the signal source from the signal.
Preferably, the feature ex
Irino Toshio
Patterson Roy D.
LandOfFree
Mellin-transform information extractor for vibration sources does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Mellin-transform information extractor for vibration sources, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Mellin-transform information extractor for vibration sources will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3192431