Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition
Reexamination Certificate
2000-10-17
2004-04-06
Dorvil, Richemond (Department: 2641)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Recognition
C704S243000, C704S251000, C382S190000
Reexamination Certificate
active
06718306
ABSTRACT:
CROSS-REFERENCE TO RELATED APPLICATIONS
This application is based upon and claims the benefit of priority from the prior Japanese Patent Application No. 11-299745, filed Oct. 21, 1999, the entire contents of which are incorporated herein by reference.
BACKGROUND OF THE INVENTION
The present invention relates to a speech collating apparatus and a speech collating method for identifying a person with speech data.
Generally, for identifying a speaker with a speech, a speech signal to be collated is converted to an acoustic parameter such as the frequency spectrum or the like before a collation since it is not efficient to directly compare the speech signal with registered speech signals. Other acoustic parameters available for the purpose may be the principle frequency (pitch frequency), speech energy, format frequency, zero-crossing number, and the like.
Here, since these acoustic parameters include phonetic information primarily and personal information secondarily, a new characteristic amount unique to a speaker must be created from the acoustic parameters for comparison in order to improve the hit rate when the speaker is identified.
A conventional speaker identification is performed in the following manner.
FIG. 14
is a flow chart illustrating a procedure of a speaker identification by means of a conventional speech collating apparatus.
(1) An input speech signal uttered for a word is divided into frames of predetermined unit time, and the frequency spectrum is calculated for each of the frames to derive a time series distribution of the frequency spectra (hereinafter referred to as the “sound spectrogram”) (step C
1
).
(2) A speech section is detected from the sound spectrogram. (step C
2
).
(3) It is determined whether the speech section is a spoken, a non-spoken, or a silent section to extract the spoken sections from the speech section. Then, the speech section divided into blocks each of which corresponds to each of the spoken sections (step C
3
).
(4) As a characteristic amount unique to the speaker, an additive average of the sound spectrogram in the time direction (hereinafter referred to as the “average spectrum”) is calculated for the blocks (step C
4
).
(5) It is determined whether the processing is for registration or for collation, and the average spectrum for the blocks is registered as a characteristic amount of a registered speaker when the registration is intended (steps C
5
→C
6
).
(6) It is determined whether the processing is for registration or for collation, and the similarity with respect to the characteristic amount of the registered speaker is calculated with the average spectrum of the blocks used as a characteristic amount of an unknown speaker (steps C
5
→C
7
).
(7) The similarity of the unknown speaker to the registered speaker is compared with a previously set threshold value to determine the identity of the registered speaker with the unknown speaker (step C
8
).
As described above, the speaker identification procedure performed by the conventional speech collating apparatus collates a speech signal input by a registered speaker (hereinafter referred to as the “registered speech signal”) with a speech signal input by an unknown speaker for collation (hereinafter referred to as the “unknown speech signal”) by (1) converting the speech signal to the sound spectrogram; (2) detecting a speech section from the sound spectrogram; (3) extracting a spoken section from the detected speech section based on a determination whether the speech section is a spoken, a non-spoken, or a silent section; and (4) deriving a characteristic amount for each of blocks divided from the extracted spoken section. In this way, the calculation of the characteristic amount applied to the collation processing for actually determining the identity of the registered speech signal with the unknown speech signal involves at least four preprocessing stages, so that a large number of processing steps are required for the overall speaker identification processing.
Also, although the conventional speaker identification procedure which utilizes the additive average of the sound spectrogram in a block in the time direction as a characteristic amount unique to a speaker is advantageous in its relatively simple processing, the creation of a stable characteristic amount requires speech signal data for a relatively long period of time. In addition, since information in the time axis direction is compressed, this procedure is not suitable for text dependent speaker identification. Moreover, since the conventional speaker identification procedure averages personal information superimposed on the phonetic information to the accompaniment of the averaging of the phonetic information, a sufficient characteristic amount is not provided. For this reason, an extra characteristic amount must be added for improving the hit rate, resulting in requiring an extremely large number of preprocessing steps.
Therefore, the improvement of the hit rate implies the problem of an extremely large number of preprocessing steps involved therein.
BRIEF SUMMARY OF THE INVENTION
Accordingly, it is an object of the present invention to provide a speech collating apparatus and a speech collating method which are capable of identifying a speaker at a high hit rate without the need for a large number of preprocessing steps.
According to the present invention, a speech data collating apparatus comprising data converting means for converting two speech signals subjected to a comparison to two two-dimensional data indicative of speech characteristics of the two speech signals; template placing means for placing a plurality of templates for defining a plurality of areas on one of the two-dimensional data; correlated area detecting means for detecting areas on the other of the two-dimensional data and having a maximum correlation with regard to a plurality of areas on the other of the two-dimensional data and corresponding to the plurality of templates; and collation determining means for comparing a mutual positional relationship of the plurality of templates on the one of the two-dimensional data with a mutual positional relationship of the plurality of areas on the other of the two-dimensional data detected by the correlated area detecting means to determine identity between the two speech signals.
Additional objects and advantages of the present invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the present invention.
The objects and advantages of the present invention may be realized and obtained by means of the instrumentalities and combinations particularly pointed out hereinafter.
REFERENCES:
patent: 2938079 (1960-05-01), Flanagan
patent: 3636261 (1972-01-01), Preston, Jr.
patent: 4581760 (1986-04-01), Schiller et al.
patent: 4989249 (1991-01-01), Oka et al.
patent: 5067162 (1991-11-01), Driscoll, Jr. et al.
patent: 5121428 (1992-06-01), Uchiyama et al.
patent: 5377302 (1994-12-01), Tsiang
patent: 5381512 (1995-01-01), Holton et al.
patent: 5548647 (1996-08-01), Naik et al.
patent: 5764853 (1998-06-01), Watari et al.
patent: 5893058 (1999-04-01), Kosaka
patent: 6088428 (2000-07-01), Trandal et al.
patent: 6134340 (2000-10-01), Hsu et al.
patent: 6178261 (2001-01-01), Williams et al.
patent: 0 508 845 (1992-10-01), None
patent: 63-41989 (1988-02-01), None
patent: 09-282458 (1997-10-01), None
patent: 11-250261 (1999-09-01), None
Furui, “digital speech processing, synthesis, and recognition”, ISBN 0-8247-7965-7,1989, p. 291-309.*
Flanagan, “speech analysis synthesis and perception”, academic press Inc, 1965, p. 164-166.*
Arai et al. “speech intelligibility in the presence of cross-channel spectral asynchrony”, IEEE, ICASSP, 1998, pp. 933-936.*
Pellom et al. “an efficient scoring algorithm for Gaussian mixture model based speaker identification”, IEEE, Signal Processin Letters, vol. 5, No. 11, 1998, pp. 281-284.*
S. Anderson et al: “A Single Chip Sensor & Image Processor for Fingerprint Verification”, Proceedings o
Satoh Katsuhiko
Takeda Tsuneharu
Casio Computer Co. Ltd.
Dorvil Richemond
Frishauf Holtz Goodman & Chick P.C.
Han Qi
LandOfFree
Speech collating apparatus and speech collating method does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Speech collating apparatus and speech collating method, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Speech collating apparatus and speech collating method will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3186763