Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition
Reexamination Certificate
2011-04-19
2011-04-19
Vo, Huyen X. (Department: 2626)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Recognition
C704S245000, C704S250000
Reexamination Certificate
active
07930179
ABSTRACT:
Systems and methods for unsupervised segmentation of multi-speaker speech or audio data by speaker. A front-end analysis is applied to input speech data to obtain feature vectors. The speech data is initially segmented and then clustered into groups of segments that correspond to different speakers. The clusters are iteratively modeled and resegmented to obtain stable speaker segmentations. The overlap between segmentation sets is checked to ensure successful speaker segmentation. Overlapping segments are combined and remodeled and resegmented. Optionally, the speech data is processed to produce a segmentation lattice to maximize the overall segmentation likelihood.
REFERENCES:
patent: 5598507 (1997-01-01), Kimber et al.
J-F. Bonastre et al, “A Speaker Tracking System Based on Speaker Turn Detection for NIST Evaluation,”Proc. ICASSP 2000, IEEE Int'l. Conf. on Acoustics, Speech and Signal Processing, Istanbul, Turkey 2000, pp. 1177-1180.
K. Mori et al., “Speaker Change Detection and Speaker Clustering Using VQ Distortion for Broadcast News Speech Recognition,”Proc. ICASSP 2001, IEEE Int'l. Conf. on Acoustics, Speech and Signal Processing, Salt Lake City, Utah 2001, pp. 413-416.
D. A. Reynolds et al., “Robust Text-independent Speaker Identification Using Gaussian Mixture Models,”IEEE Trans. on Speech and Audio Processing, vol, 3, 1995, pp. 1339-1342.
M-H. Siu et al., “An Unsupervised, Sequential Learning Algorithm for the Segmentation of Speech Waveforms with Muitiple Speakers,”Proc. ICASSP 1992. IEEE Int'l. Conf. on Acoustics, Speech and Signal Processing, San Francisco, California, vol. II, 1992, pp. 189-192.
M. Sugiyama et al., “Speech Segmentation and Clustering Based on Speaker Features,”Proc. ICASSP 1993. IEEE Int'l. Conf. on Acoustics, Speech and Signal Processing, 1993, pp. 392-398.
L. Wilcox et al., “Segmentation of Speech Using Speaker Identification,”Proc. ICASSP 1994, IEEE Int'l. Conf. on Acoustics, Speech and Signal Processing, Adelaide, Australia, 1994, pp. 161-164.
A.E. Rosenberg, et al., “Speaker Detection in Broadcast Speech Databases”, Proc. of ICSLP98, 5thIntl. Conf. on Spoken Lang. Processing, Sydney, 1339-1342, 1998.
B. Zhou et al., “Unsupervised Audio Stream Segmentation and Clustering Via the Bayesian Information Criterion”, Proc. of ICSLP 2000, 6thIntl. Conf. on Spoken Lang. Processing, Beijing, III, 714-717, 2000.
P. Delacourt et al., “Detection of Speaker Changes in an Audio Document”, Proc. Eurospeech 99, 1195-1198, Budapest, 1999.
R. Dunn et al., “Approaches to Speaker Detection and Tracking in Conversational Speech”, Digital Signal Processing, 10, 93-112, 2000.
J. Makhoul, et al., “Speech and Language Technologies for Audio Indexing and Retrieval”, Proc. of the IEEE, 88, 1338-1352, Aug. 2000.
H. Gish et al., “Segregation of Speakers for Speech Recognition and Speaker Identification”, Proc. ICASSP 91. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing, Toronto, 873-876, 1991.
J-L. Gauvain et al., “Partitioning and Transcription of Broadcast News Data”, Proc. of ICSLP98, 5thIntl. Conf. on Spoken Lang. Processing, Sydney, 1335-1338, 1998.
I. Magrin-Chagnoileau et al., “Detection of Target Speakers in Audio Databases”, Proc. ICASSP 99. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing, Phoenix, 821-824, 1999.
Gorin Allen Louis
Liu Zhu
Parthasarathy Sarangarajan
Rosenberg Aaron Edward
AT&T Intellectual Property II L.P.
Vo Huyen X.
LandOfFree
Unsupervised speaker segmentation of multi-speaker speech data does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Unsupervised speaker segmentation of multi-speaker speech data, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Unsupervised speaker segmentation of multi-speaker speech data will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2711098