Data processing: speech signal processing – linguistics – language – Audio signal bandwidth compression or expansion
Reexamination Certificate
1999-07-02
2001-11-20
Dorvil, Richemond (Department: 2741)
Data processing: speech signal processing, linguistics, language
Audio signal bandwidth compression or expansion
C704S501000
Reexamination Certificate
active
06321200
ABSTRACT:
FIELD OF THE INVENTION
The invention relates generally to the field of signal processing, and in particular to extracting features from a mixture of acoustic signals.
BACKGROUND OF THE INVENTION
To date, very little work has been done on characterizing environmental and ambient sounds. Most prior art acoustic signal representation methods have focused on human speech and music. However, there are no good representation methods for many sound effects heard in films, television, video games, and virtual environments, such footsteps, traffic, doors slamming, laser guns, hammering, smashing, thunder claps, leaves rustling, water spilling, etc. These environmental acoustic signals are generally much harder to characterize than speech and music because they often comprise multiple noisy and textured components, as well as higher-order structural components such as iterations and scattering.
One particular application that could use such a representation scheme is video processing. Methods are available for extracting, compressing, searching, and classifying video objects, see for example the various MPEG standards. No such methods exist for “audio” objects, other than when the audio objects are speech.
For example, it maybe desired to search through a video library to locate all video segments where John Wayne is galloping on a horse while firing his six-shooter. Certainly it is possible to visually identify John Wayne or a horse. But it much more difficult to pick out the rhythmic clippidy-clop of a galloping horse, and the staccato percussions of a revolver. Recognition of audio events can delineate action in video.
Another application that could use the representation is sound synthesis. It is not until the features of a sound are identified before it becomes possible to synthetically generate a sound, other than be trail and error.
In the prior art, representations for non-speech sounds have usually focused on particular classes of non-speech sound, for example, simulating and identifying specific musical instruments, distinguishing submarine sounds from ambient sea sounds and recognition of underwater mammals by their utterances. Each of these applications requires a particular arrangement of acoustic features that do not generalize beyond the specific application.
In addition to these specific applications, other work has focused on developing generalized acoustic scene analysis representations. This research has become known as Computational Auditory Scene Analysis. These systems require a lot of computational effort due to their algorithmic complexity. Typically, they use heuristic schemes from Artificial Intelligence as well as various inference schemes. Whilst such systems provide valuable insight into the difficult problem of acoustic representations, the performance of such systems has never been demonstrated to be satisfactory with respect to classification and synthesis of acoustic signals in a mixture.
Therefore, there is a need for a robust and reliable representation that can deal with a broad class of signal mixtures.
SUMMARY OF THE INVENTION
The invention provides a method for extracting features from a mixture of signals, for example, acoustic, electric, seismic, vibrational, and physiological signals. As a feature of the invention, an acoustic mixture can include non-speech sounds. The mixture can originate at a signal source, or multiple sources. The method filters the mixture of signals by one or more filterbanks to produce a plurality of filtered signals. The filtering can be frequency based, in which case the filtered signal is a band-pass signal. The filters can be logarithmic spaced, as in a constant-Q (CQ) or wavelet filterbank, or they can be linearly spaced as in a short time fast Fourier transform representation (STFT).
Each filtered signal is windowed to produce a plurality of multi-dimensional observation matrices. Each observation matrix contains frequency-domain samples corresponding 10-50 millisecond portions of the signal, if the signal is acoustic. For either types of signals different window sizes can be used. The multi-dimensional observation matrices are reduced in their dimensionality using a single value decomposition (SVD).
Temporal and spectral features are extracted from the reduced dimensionality matrices using independent component analysis. The features can include temporal and spectral characteristics. The features can be used for signal classification, synthesis, comparing, and compression.
REFERENCES:
patent: 4720802 (1988-01-01), Damoulakis et al.
patent: 5315532 (1994-05-01), Comon
patent: 5383164 (1995-01-01), Sejnowski et al.
patent: 5602751 (1997-02-01), Edelblute
patent: 5615302 (1997-03-01), McEachern
patent: 5632003 (1997-05-01), Davidson et al.
patent: 5893058 (1999-04-01), Kosaka
patent: 5913188 (1999-06-01), Tzirkel-Hancock
patent: 5946656 (1999-08-01), Rahim et al.
Lee et al (“Combining Time-Delayed Decorrelation and ICA: Towards Solving the Cocktail Party Problem,” Te-Won Lee; Ziehe, A.; Orglmeister, R.; Sejnowski, T., Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing.*
Meyer-Baese et al (“Fast Implementation of Orthogonal Wavelet Filterbanks using Field-Programmable Logic,” Meyer-Baese, U.; Buros, J.; Trautmann, W.; Taylor, F., IEEE International Conference on Acoustics, Speech, and Signal Processing, pp: 2119-2122.*
Michael A. Casey; “Auditory Group Theory with Applications to Statistical Basis Methods for Structured Audio”.
Michael A. Casey; “Understanding Musical Sound with Forward Models and Physical Models”; Perceptual Computing Group, MIT Media Laboratory.
D.P.W. Ellis; “Prediction-Driven Computational Auditory Scene Analysis for Dense Sound Mixtures”; Paper presented at the ESCA workshop on the Auditory Basis of Speech Perception, Keele UK, Jul. 1996.
K.D. Martin; “Toward Automatic Sound Source Recognition: Identifying Musical Instruments”; Paper presented at the NATO Computational Hearing Advanced Study Institute, Il Ciocco, Italy, Jul. 1-12, 1998.
Huynh et al.; “Classification of Underwater Mammals using Feature Extraction Based on Time-Frequency Analysis and BCM Theory”; Physics Department and Institute for Brain and Neural System, Brown University.
Martin et al.; “2pMU9. Musical Instrument Identification: A Pattern-Recognition Approach”; Paper presented at 136thmeeting of the Acoustical Society of America, Oct. 13, 1998.
N.N. Bitar et al.; “Integrationof STFT and Wigner Analysis in a Knowledge-Based System for Sound Understanding”; ECS Department, Brown University.
Brinkman Dirk
Dorvil Richemond
Mitsubish Electric Research Laboratories, INC
Nolan Daniel A.
LandOfFree
Method for extracting features from a mixture of signals does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method for extracting features from a mixture of signals, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method for extracting features from a mixture of signals will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2592294