Methods and apparatus for unknown speaker labeling using...

Data processing: speech signal processing – linguistics – language – Speech signal processing – Application

Reexamination Certificate

Rate now

[ 0.00 ] – not rated yet Voters 0 Comments 0

Details Methods and apparatus for unknown speaker labeling using... Methods and apparatus for unknown speaker labeling using...

: 1999-11-05
: 2002-07-23
: Dorvil, Richemond (Department: 2741)
: Data processing: speech signal processing, linguistics, language
: Speech signal processing
: Application

: C704S500000, C704S275000, C704S251000
: Reexamination Certificate
: active
: 06424946
: ABSTRACT:

FIELD OF THE INVENTION
The present invention relates generally to audio information classification systems and, more particularly, to methods and apparatus for transcribing audio information and identifying speakers in an audio file.
BACKGROUND OF THE INVENTION
Many organizations, such as broadcast news organizations and information retrieval services, must process large amounts of audio information, for storage and retrieval purposes. Frequently, the audio information must be classified by subject or speaker name, or both. In order to classify audio information by subject, a speech recognition system initially transcribes the audio information into text for automated classification or indexing. Thereafter, the index can be used to perform query-document matching to return relevant documents to the user.
Thus, the process of classifying audio information by subject has essentially become fully automated. The process of classifying audio information by speaker, however, often remains a labor intensive task, especially for real-time applications, such as broadcast news. While a number of computationally-intensive off-line techniques have been proposed for automatically identifying a speaker from an audio source using speaker enrollment information, the speaker classification process is most often performed by a human operator who identifies each speaker change, and provides a corresponding speaker identification.
The parent and grandparent applications to the present invention disclose methods and apparatus for retrieving audio information based on the audio content (subject) as well as the identity of the speaker. The parent application, U.S. patent application Ser. No. 09/345,237, for example, discloses a method and apparatus for automatically transcribing audio information from an audio source while concurrently identifying speakers in real-time, using an existing enrolled speaker database. The parent application, however, can only identify the set of the speakers in the enrolled speaker database. In addition, the parent application does not allow new speakers to be added to the enrolled speaker database while audio information is being processed in real-time. A need therefore exists for a method and apparatus that automatically identifies unknown speakers in real-time or in an off-line manner. A further need exists for a method and apparatus that automatically identifies unknown speakers using concurrent transcription, segmentation, speaker identification and clustering techniques.
SUMMARY OF THE INVENTION
Generally, a method and apparatus are disclosed for identifying speakers participating in an audio-video source, whether or not such speakers have been previously registered or enrolled. The disclosed unknown speaker classification system includes a speech recognition system, a speaker segmentation system, a clustering system and a speaker identification system. The speech recognition system produces transcripts with time-alignments for each word in the transcript. The speaker segmentation system separates the speakers and identifies all possible frames where there is a segment boundary between non-homogeneous speech portions. The clustering system clusters homogeneous segments (generally corresponding to the same speaker), and assigns a cluster identifier to each detected segment, whether or not the actual name of the speaker is known. Thus, segments corresponding to the same speaker should have the same cluster identifier.
According to one aspect of the invention, the disclosed speaker identification system uses an enrolled speaker database that includes background models for unenrolled speakers to assign a speaker to each identified segment. Once the speech segments are identified by the segmentation system, the disclosed unknown speaker identification system compares the segment utterances to the enrolled speaker database and finds the “closest” speaker, if any, to assign a speaker label to each identified segment. A speech segment having an unknown speaker is initially assigned a general speaker label from a set of background models for speaker identification, such as “unenrolled male” or “unenrolled female.” The “unenrolled” segment is assigned a segment number and receives a cluster identifier assigned by the clustering system. Thus, the clustering system assigns a unique cluster identifier for each speaker to further differentiate the general speaker labels.
The results of the present invention can be directly output to a user, for example, providing the transcribed text for each segment, together with the assigned speaker label. If a given segment is assigned a temporary speaker label associated with an unenrolled speaker, the user can be prompted to provide the name of the speaker. Once the user assigns a speaker label to an audio segment having an unknown speaker, the same speaker name can be automatically assigned to any segments having the same cluster identifier. In addition, the enrolled speaker database can be updated to enroll the previously unknown speaker using segments associated with the speaker as speaker training files.
A more complete understanding of the present invention, as well as further features and advantages of the present invention, will be obtained by reference to the following detailed description and drawings.

REFERENCES:
patent: 5659662 (1997-08-01), Wilcox et al.
patent: 6185527 (2001-02-01), Petkovic et al.
ICASSP-97. Roy et al., “Speaker Identification based text to audio alignment for audio retrieval system”. pp. 1099-1102, vol. 2. Apr. 1997.*
S. Dharanipragada et al., “Experimental Results in Audio Indexing,” Proc. ARPA SLT Workshop, (Feb. 1996).
L. Polymenakos et al., “Transcription of Broadcast News—Some Recent Improvements to IBM's LVCSR System,” Proc. APRA SLT Workshop, (Feb. 1996).
R. Bakis, “Transcription of Broadcast News Shows with the IBM Large Vocabulary Speech Recognition System,” Proc. ICASSP98, Seattle, WA (1998).
H. Beigi et al., “A Distance Measure Between Collections of Distributions and its Application to Speaker Recognition,” Proc. ICASSP98, Seattle, WA (1998).
S. Chen, “Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion,” Proceedings of the Speech Recognition Workshop (1998).
S. Chen et al., “Clustering via the Bayesian Information Criterion with Applications in Speech Recognition,” Proc. ICASSP98, Seattle, WA (1998).
S. Chen et al., “IBM's LVCSR System for Transcription of Broadcast News Used in the 1997 Hub4 English Evalution,” Proceedings of the Speech Recognition Workshop (1998).
S. Dharanipragada et al., “A Fast Vocabulary Independent Algorithm for Spotting Words in Speech,” Proc. ICASSP98, Seattle, WA (1998).
J. Navratil et al., “An Efficient Phonotactic-Acoustic system for Language Identification,” Proc. ICASSP98, Seattle, WA (1998).
G. N. Ramaswamy et al., “Compression of Acoustic Features for Speech Recognition in Network Environments,” Proc. ICASSP98, Seattle, WA (1998).
S. Chen et al., “Recent Improvements to IBM's Speech Recognition System for Automatic Transcription of Broadcast News,” Proceedings of the Speech Recognition Workshop (1999).
S. Dharanipragada et al., “Story Segmentation and Topic Detection in the Broadcast News Domain,” Proceedings of the Speech Recognition Workshop (1999).
C. Neti et al., “Audio-Visual Speaker Recognition for Video Broadcast News,” Proceedings of the Speech Recognition Workshop (1999).

Affiliated with

Tritschler Alain Charles Louis

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Viswanathan Mahesh

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Also associated with

Dorvil Richemond

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

Otterstedt Paul J.

Attorney

[ 0.00 ] – not rated yet Voters 0 Comments 0

Ryan & Mason & Lewis, LLP

Law Firm

[ 0.00 ] – not rated yet Voters 0 Comments 0

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Methods and apparatus for unknown speaker labeling using... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Methods and apparatus for unknown speaker labeling using..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Methods and apparatus for unknown speaker labeling using... will most certainly appreciate the feedback.

Rate now

Comments { 0 }

Profile ID: LFUS-PAI-O-2887884

All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.

Canada

Charities
Companies
MP Candidates
Patents
Employee Salary Disclosure

World

Places of the World
Scientific Papers

United States

Banks
Companies
Counties
Patents
Employee Salary Disclosure