Methods and apparatus for tracking speakers in an audio stream

Data processing: speech signal processing – linguistics – language – Speech signal processing – Application

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S270100, C704S272000

Reexamination Certificate

active

07739114

ABSTRACT:
Speakers are automatically identified in an audio (or video) source. The audio information is processed to identify potential segment boundaries. Homogeneous segments are clustered substantially concurrently with the segmentation routine, and a cluster identifier is assigned to each identified segment. A segmentation subroutine identifies potential segment boundaries using the BIC model selection criterion. A clustering subroutine uses a BIC model selection criterion to assign a cluster identifier to each of the identified segments. If the difference of BIC values for each model is positive, the two clusters are merged.

REFERENCES:
patent: 5659662 (1997-08-01), Wilcox et al.
patent: 5897616 (1999-04-01), Kanevsky et al.
patent: 5930748 (1999-07-01), Kleider et al.
patent: 6345252 (2002-02-01), Beigi et al.
patent: 6345253 (2002-02-01), Viswanathan
patent: 6421645 (2002-07-01), Beigi et al.
patent: 6424946 (2002-07-01), Tritschler et al.
patent: 07-287969 (1995-10-01), None
patent: 2000-298498 (2000-10-01), None
Scott Shaobing Chen et al. “speaker, Environment and Channel Change Detection and Cluster via the Bayesian Information Criterion,” proceedings of the DARPA broadcast news transcription and understanding workshop, Lansdowne, VA, Feb. 8-11, 1998.
Scott Shaobing Chen et al. “clustering via the Bayesian information criterion with applications in speech recognition,” Acoustics,Speech and Signal Processing, 1998, proceed 1998 IEEE international Conference on p. 645-648 vol. 2, May 12-15, 1998.
S. Dharanipragada et al., “Experimental Results in Audio Indexing,” Proc. ARPA SLT Workshop, (Feb. 1996).
L. Polymenakos et al., “Transcription of Broadcast News—Some Recent Improvements to IBM's LVCSR System,” Proc. ARPA SLT Workshop, (Feb. 1996).
R. Baksi, “Transcription of Broadcast News Shows with the IBM Large Vocabulary Speech Recognition System,” Proc. ICASSP98, Seattle, WA (1998).
H. Beigi et al., “A Distance Measure Between Collections of Distributions and its Application to Speaker Recognition,” Proc. ICASSP98, Seattle, WA (1998).
S. Chen, “Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion,” Proceedings of the Speech Recognition Workshop (1998).
S. Chen et al., “Clustering via the Bayesian Information Criterion with Applications in Speech Recognition,” Proc. ICASSP98, Seattle, WA (1998).
S, Chen et al., “IBM's LVCSR System for Transcription of Broadcast News Used in the 1997 Hub4 English Evaluation,” Proceedings of the Speech Recognition Workshop (1998).
S. Dharanipragada et al., “A Fast Vocabulary Independent Algorithm for Spotting Words in Speech,” Proc. ICASSP98, Seattle, WA (1998).
J. Navratil et al., An Efficient Phonotactic-Acoustic system for Language Identification, Proc. ICASSP98, Seattle, WA (1998).
G. N. Ramaswamy et al., “Compression of Acoustic Features for Speech Recognition in Network Environments,” Proc. ICASSP98, Seattle, WA (1998).
S. Chen et al., “Recent Improvements to IBM's Speech Recognition System for Automatic Transcription of Broadcast News,” Proceedings of the Speech Recognition Workshop (1999).
S. Dharanipragada et al., “Story Segmentation and Topic Detection in the Broadcast News Domain,” Proceedings of the Speech Recognition Workshop (1999).
C. Neti et al., “Audio-Visual Speaker Recognition for Video Broadcast News,” Proceedings of the Speech Recognition Workshop (1999).
Foote et al., “Finding Presentations in Recorded Meetings Using Audio and Video Features,” Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 6, pgs. pp. 3029-3032 (Mar. 15, 1999).
Nakagawa et al., “Segmentation of Continuous Speech by HMM and Bayesian Probability,” Transactions of the Institute of Electronics, Information and Communication Engineers, D-II, vol. J72 D-II, No. 1, pp. 1-10 (Jan. 1989).
Wilcox et al., “Segmentation of Speech Using Speaker Identification,” Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 1, pp. 1-161-1-164 (Apr. 19, 1994).
Ozawa Kazunori, “Voice Signal Encoding and Decoding Method, Voice Signal Encoder and Voice Signal Decoder,” Patent Abstracts of Japan, Publication No. 01-257999 (Oct. 16, 1989).
Hanada Eisuke, “Vice Encoding System,” Patent Abstracts of Japan, Publication No. 03-198099 (Aug. 29, 1991).
Eberman et al., “Signal Segmentalization Method Basing Cluster Constitution,” Patent Abstracts of Japan, Publication No. 10-105187 (Apr. 24, 1998).
Patent Abstracts of Japan, Publication No. 59-111699 (Jun. 27, 1984).

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Methods and apparatus for tracking speakers in an audio stream does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Methods and apparatus for tracking speakers in an audio stream, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Methods and apparatus for tracking speakers in an audio stream will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-4243035

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.