Unsupervised speaker segmentation of multi-speaker speech data

Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S246000, C704S231000

Reexamination Certificate

active

10350727

ABSTRACT:
Systems and methods for unsupervised segmentation of multi-speaker speech or audio data by speaker. A front-end analysis is applied to input speech data to obtain feature vectors. The speech data is initially segmented and then clustered into groups of segments that correspond to different speakers. The clusters are iteratively modeled and resegmented to obtain stable speaker segmentations. The overlap between segmentation sets is checked to ensure successful speaker segmentation. Overlapping segments are combined and remodeled and resegmented. Optionally, the speech data is processed to produce a segmentation lattice to maximize the overall segmentation likelihood.

REFERENCES:
patent: 5598507 (1997-01-01), Kimber et al.
P. Delacourt et al., “Detection of Speaker Changes in an Audio Document”, Proc. Eurospeech 99, 1195-1198, Budapest, 1999.
R. Dunn et al., “Approaches to Speaker Detection and Tracking in Conversational Speech”, Digital Signal Processing, 10, 93-112, 2000.
H. Gish et al., “Segregation of Speakers for Speech Recognition and Speaker Identification”, Proc. ICASSP 91. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing, Toronto, 873-876, 1991.
J-L. Gauvain et al., “Partitioning and Transcription of Broadcast News Data”, Proc of ICSLP98, 5thIntl. Conf. on Spoken Lang. Processing, Sydney, 1335-1338, 1998.
I. Magrin-Chagnolleau et al., “Detection of Target Speakers in Audio Databases”, Proc. ICASSP 99, IEEE Intl. Conf. on Acoustics, Speech and Signal Processing, Phoenix, 821-824, 1999.
J. Makhoul, et al., “Speech and Language Technologies for Audio Indexing and Retrieval”, Proc. of the IEEE, 88, 1338-1352, Aug. 2000.
A.E. Rosenberg, et al., “Speaker Detection in Broadcast Speech Databases”, Proc. of ICSLP98, 5thIntl. Conf. on Spoken Lang. Processing, Sydney, 1339-1342, 1998.
B. Zhou et al., “Unsupervised Audio Stream Segmentation and Clustering Via the Bayesian Information Criterion”, Proc. of ICSLP 2000, 6thIntl. Conf. on Spoken Lang. Processing, Beijing, III, 714-717, 2000.
J-F. Bonastre et al, “A Speaker Tracking System Based on Speaker Turn Detection for NIST Evaluation,”Proc. ICASSP 2000, IEEE Int'l. Conf. On Acoustics, Speech and Signal Processing,Istanbul, Turkey 2000, pp. 1177-1180.
K. Mori et al., “Speaker Change Detection and Speaker Clustering Using VQ Distortion for Broadcast News Speech Recognition,”Proc. ICASSP 2001, Int'l. Conf. On Acoustics, Speech and Signal Processing,Salt Lake City, Utah 2001, pp. 413-416.
D. A. Reynolds et al., “Robust Text-Independent Speaker Identification Using Gaussian Mixture Models,”IEEE Trans. On Speech and Audio Processing.vol. 3, 1995, pp. 1339-1342.
M-H. Siu et al., “An Unsupervised, Sequential Learning Algorithm for the Segmentation of Speech Waveforms with Multiple Speakers,”Proc. ICASSP 1992, IEEE Int'l. Conf. on Acoustics. Speech and Signal Processing,San Francisco, California, vol. II, 1992, pp. 189-192.
M. Sugiyama et al., “Speech Segmentation and Clustering Based on Speaker Features,”Proc. ICASSP 1993, IEEE Int'l. Conf. On Acoustics, Speech and Signal Processing.1993, pp. 395-398.
L. Wilcox et al., “Segmentation of Speech Using Speaker Identification,”Proc. ICASSP 1994, IEEE Int'l Conf. On Acoustics, Speech and Signal Processing,Adelaide, Australia, 1994, pp. 161-164.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Unsupervised speaker segmentation of multi-speaker speech data does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Unsupervised speaker segmentation of multi-speaker speech data, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Unsupervised speaker segmentation of multi-speaker speech data will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3872265

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.