Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission
Reexamination Certificate
2007-11-13
2007-11-13
Vo, Huyen X. (Department: 2626)
Data processing: speech signal processing, linguistics, language
Speech signal processing
For storage or transmission
C704S246000, C704S231000
Reexamination Certificate
active
10350727
ABSTRACT:
Systems and methods for unsupervised segmentation of multi-speaker speech or audio data by speaker. A front-end analysis is applied to input speech data to obtain feature vectors. The speech data is initially segmented and then clustered into groups of segments that correspond to different speakers. The clusters are iteratively modeled and resegmented to obtain stable speaker segmentations. The overlap between segmentation sets is checked to ensure successful speaker segmentation. Overlapping segments are combined and remodeled and resegmented. Optionally, the speech data is processed to produce a segmentation lattice to maximize the overall segmentation likelihood.
REFERENCES:
patent: 5598507 (1997-01-01), Kimber et al.
P. Delacourt et al., “Detection of Speaker Changes in an Audio Document”, Proc. Eurospeech 99, 1195-1198, Budapest, 1999.
R. Dunn et al., “Approaches to Speaker Detection and Tracking in Conversational Speech”, Digital Signal Processing, 10, 93-112, 2000.
H. Gish et al., “Segregation of Speakers for Speech Recognition and Speaker Identification”, Proc. ICASSP 91. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing, Toronto, 873-876, 1991.
J-L. Gauvain et al., “Partitioning and Transcription of Broadcast News Data”, Proc of ICSLP98, 5thIntl. Conf. on Spoken Lang. Processing, Sydney, 1335-1338, 1998.
I. Magrin-Chagnolleau et al., “Detection of Target Speakers in Audio Databases”, Proc. ICASSP 99, IEEE Intl. Conf. on Acoustics, Speech and Signal Processing, Phoenix, 821-824, 1999.
J. Makhoul, et al., “Speech and Language Technologies for Audio Indexing and Retrieval”, Proc. of the IEEE, 88, 1338-1352, Aug. 2000.
A.E. Rosenberg, et al., “Speaker Detection in Broadcast Speech Databases”, Proc. of ICSLP98, 5thIntl. Conf. on Spoken Lang. Processing, Sydney, 1339-1342, 1998.
B. Zhou et al., “Unsupervised Audio Stream Segmentation and Clustering Via the Bayesian Information Criterion”, Proc. of ICSLP 2000, 6thIntl. Conf. on Spoken Lang. Processing, Beijing, III, 714-717, 2000.
J-F. Bonastre et al, “A Speaker Tracking System Based on Speaker Turn Detection for NIST Evaluation,”Proc. ICASSP 2000, IEEE Int'l. Conf. On Acoustics, Speech and Signal Processing,Istanbul, Turkey 2000, pp. 1177-1180.
K. Mori et al., “Speaker Change Detection and Speaker Clustering Using VQ Distortion for Broadcast News Speech Recognition,”Proc. ICASSP 2001, Int'l. Conf. On Acoustics, Speech and Signal Processing,Salt Lake City, Utah 2001, pp. 413-416.
D. A. Reynolds et al., “Robust Text-Independent Speaker Identification Using Gaussian Mixture Models,”IEEE Trans. On Speech and Audio Processing.vol. 3, 1995, pp. 1339-1342.
M-H. Siu et al., “An Unsupervised, Sequential Learning Algorithm for the Segmentation of Speech Waveforms with Multiple Speakers,”Proc. ICASSP 1992, IEEE Int'l. Conf. on Acoustics. Speech and Signal Processing,San Francisco, California, vol. II, 1992, pp. 189-192.
M. Sugiyama et al., “Speech Segmentation and Clustering Based on Speaker Features,”Proc. ICASSP 1993, IEEE Int'l. Conf. On Acoustics, Speech and Signal Processing.1993, pp. 395-398.
L. Wilcox et al., “Segmentation of Speech Using Speaker Identification,”Proc. ICASSP 1994, IEEE Int'l Conf. On Acoustics, Speech and Signal Processing,Adelaide, Australia, 1994, pp. 161-164.
Gorin Allen Louis
Liu Zhu
Parthasarathy Sarangarajan
Rosenberg Aaron Edward
AT&T Corp
Vo Huyen X.
LandOfFree
Unsupervised speaker segmentation of multi-speaker speech data does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Unsupervised speaker segmentation of multi-speaker speech data, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Unsupervised speaker segmentation of multi-speaker speech data will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3872265