Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition
Reexamination Certificate
2000-03-15
2003-10-14
Chawan, Vijay (Department: 2654)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Recognition
C704S207000, C704S240000, C704S219000, C704S226000, C704S220000, C375S242000, C375S243000
Reexamination Certificate
active
06633841
ABSTRACT:
BACKGROUND
1. Technical Field
The present invention relates generally to voice activity detection in speech coding; and, more particularly, it relates to voice activity detection that accommodates substantially music-like signals in speech coding.
2. Related Art
Conventional speech signal coding systems have difficulty in coding speech signals having a substantially music-like signal contained therein Conventional speech signal coding schemes often must operate on data transmission media having limited available bandwidth. These conventional systems commonly seek to minimize data transmission rates using various techniques that are geared primarily to maintain a high perceptual quality of speech signals. Traditionally, speech coding schemes were not directed to ensuring a high perceptual quality for speech signals having a large portion of embedded music-like signals.
The reasons for this were many in various communication systems employed on various media. One common reason, within speech coding. systems designed for wireless communication systems, was the fact that air time was prohibitively expensive. A user of a wireless communication system .was not realistically expected to wait “on hold” using his wireless device. Design constraints, such as economic constraints dictated by expensive air time, were among those constraints that directed those working in the art of speech coding and speech. processing not to devote significant energies to trying to maintain a high perceptual quality for speech signals having a substantially music-like signal contained therein. Conventional speech coding methods do not typically address the problem associated with trying to ensure a high perceptual quality for speech signals having a substantially music-like signal.
Another common reason that is presently applicable, within speech coding systems designed for wireline communication systems, is the fact that the bandwidth available for such communication systems was prohibited limited. Moreover, as such communication systems continue to grow in size and complexity, the communication system became more and more congested. Various techniques have been developed in the art of speech coding and speech processing to accommodate communication systems having limited bandwidth. The discontinued transmission method is one such example, known those having skill in the art of speech coding and speech processing, to maximize data transmission over already limited communication media.
Also, within the ITU-Recommendation G.729, an annex G.729E high rate extension has recently been adopted by the industry to assist the G.729 main body, and although the annex G.729E high rate extension provides increased perceptual quality for speech-like signals than does the G.729 main body, it especially improves the quality of coded speech signals having a substantially music-like signal embedded therein. However, traditional methods of performing voice activity detection (VAD), that are embedded within the annex G.729B discontinued transmission (DTX: (VAD, SID, CNG)), that also performs silence description coding (SID) and comfort noise generation (CNG), often improperly classify substantially music-like signals as background noise signals. In short, the annex G.729B discontinued transmission (DTX: (VAD, SID, CNG)) is simply inadequate to guarantee a high perceptual quality for substantially music-like signals. This is largely because the available data transmission rate (bit rate) is substantially lower than the annex G.729E high rate extension. The present implementation of the annex G.729E high rate extension accompanied by the annex G.729B discontinued transmission (DTX: (VAD, SID, CNG)) and its desirable voice activity detection simply fails to provide a high perceptual quality for substantially music-like signals.
Further limitations and disadvantages of conventional and traditional systems will become apparent to one of skill in the art through comparison of such systems with the present invention as set forth in the remainder of the present application with reference to the drawings.
SUMMARY OF THE INVENTION
Various aspects of the present invention can be found in an extended speech coding system that accommodates substantially music-like signals within a speech signal while maintaining a high perceptual quality in a reproduced speech signal. The extended speech coding system contains internal circuitry that performs detection and classification of the speech signal, depending on numerous characteristics of the speech signal, to ensure the high perceptual quality in the reproduced speech signal. The invention selects an appropriate speech coding to accommodate a variety of speech signals in which the high perceptual quality is maintained.
In certain embodiments of the invention, for speech signal's having a substantially music-like signal, the extended speech coding system overrides any voice activity detection (VAD) decision, performed by a voice activity detection (VAD) correction/supervision circuitry, that is used to determine which among a plurality of source coding modes are to be employed. In one specific embodiment, the voice activity detection (VAD) correction/supervision circuitry cooperates with a conventional voice activity detection (VAD) circuitry to decide whether to use a discontinued transmission (DTX) speech signal coding mode, or a regular speech signal coding mode having a high rate extension speech signal coding mode.
In certain embodiments of the invention, a speech signal coding circuitry ensures an improved perceptual quality of a coded speech signal even during discontinued transmission (DTX). This assurance of a high perceptual quality is very desirable when there is a presence of a music-like signal in an un-coded speech signal.
REFERENCES:
patent: 5222189 (1993-06-01), Fielder
patent: 5341457 (1994-08-01), Hall et al.
patent: 5657422 (1997-08-01), Janiszewski et al.
patent: 5659622 (1997-08-01), Ashley
patent: 5778335 (1998-07-01), Ubale et al.
patent: 5809472 (1998-09-01), Morrison
patent: 5930749 (1999-07-01), Maes
patent: 6028890 (2000-02-01), Salami et al.
patent: 6081784 (2000-06-01), Tsutsui
patent: 6111183 (2000-08-01), Lindemann
patent: 6240386 (2001-05-01), Thyssen et al.
patent: 6401062 (2002-06-01), Murashima
patent: 0932141 (1999-07-01), None
patent: 2762464 (1998-10-01), None
patent: 98/27543 (1998-06-01), None
patent: 00/31720 (2000-06-01), None
Vahatalo et al., (“Voice Activity Detection for GSM adaptive multi-rate codec”, IEEE Workshop on Speech Coding Proceedings Model, Coders, and Error Criteria, Porvoo, Finland, Jun. 1999, pp. 55-57).*
of Benyassine et al., (“ITU-T recommendation G.729 Annex B: A silence compression scheme for use with G.729 optimized fo V.70 digital simultaneous voice and data applications” IEEE Communications Magazine, US, IEEE Service Center, Piscatway, N.J., vol. 35.*
Antti Vahatalo and Ingemar Johansson, “Voice Activity Detection for GSM Adaptive Multi-Rate Codec,” 1999 IEEE, pp. 55-57.
Adil Benyassine, Eyal Shlomot and Huan-Yu Su, “ITU-T Recommendation G.729 Annex B: A Silence Compression Scheme for Use with G.729 Optimized for V.70 Digital Simultaneous Voice and Data Applications,” 1997 IEEE, pp. 64-73.
Benyassine Adil
Thyssen Jes
Chawan Vijay
Farjami & Farjami LLP
Mindspeed Technologies Inc.
LandOfFree
Voice activity detection speech coding to accommodate music... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Voice activity detection speech coding to accommodate music..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Voice activity detection speech coding to accommodate music... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3163545