Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition
Reexamination Certificate
1999-12-01
2003-09-30
{haeck over (S)}mits, T{overscore (a)}livaldis Ivars (Department: 2654)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Recognition
Reexamination Certificate
active
06629070
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to a method and apparatus for detecting voice presence/absence state, and a method and apparatus for encoding a voice signal which include the method and apparatus for detecting voice presence/absence state, respectively. The method and apparatus for encoding a voice signal are used in a portable telephone and an automobile telephone for example.
2. Description of the Prior Art
A background noise generating system has been disclosed in for example JPA 7-336290 titled “VOX Controlled Communication Apparatus (translated title)”. Next, with reference to
FIGS. 1 and 2
, the related art reference will be described in brief.
FIG. 1
is a block diagram showing the structure of the apparatus according to the related art reference.
FIG. 2
is a flow chart showing the operation of the apparatus according to the related art reference.
As shown in
FIG. 1
, the apparatus according to the related art reference comprises a voice signal input terminal
610
, a frame dividing portion
620
, a voice presence state detecting portion
630
, a controlling portion
640
, a highly efficient voice encoding portion
650
, a switch
660
, and an encoded signal output terminal
670
. The voice presence state detecting portion
630
comprises a frame energy calculating portion
631
and a voice presence/absence state determining portion
632
.
Next, the overall operation of the apparatus according to the related art reference will be described in brief.
The frame dividing portion
620
receives a voice signal from the voice signal input terminal
610
(at step B
1
). The frame dividing portion
620
divides the voice signal into frames (with a period of 20 msec each). The frames are supplied to the voice presence state detecting portion
630
and the highly efficient voice encoding portion
650
(at step B
2
).
The frame energy calculating portion
631
calculates the intensity of energy of each frame of the voice signal and supplies the calculated data to the voice presence/absence state determining portion
632
(at step B
3
).
The voice presence/absence state determining portion
632
determines whether or not the intensity of energy of each frame received from the frame energy calculating portion
631
is larger than a predetermined threshold value. When the intensity of energy of the current frame is larger than the predetermined threshold value, the voice presence/absence state determining portion
632
determines that the current frame is a voice frame. When the intensity of energy of the current frame is not larger than the predetermined threshold value, the voice presence/absence state determining portion
632
determines that the current frame is a non-voice frame. The voice presence/absence state determining portion
632
supplies the determined result to the controlling portion
640
(at step B
4
).
The controlling portion
640
controls the highly efficient voice encoding portion
650
and the switch
660
corresponding to the determined result received from the voice presence/absence state determining portion
632
(at step B
5
).
In another related art reference as JPA 9-152894 titled “Voice presence/absence state determining apparatus (translated title)”, an apparatus that accurately determines whether or not each frame is a voice frame including the beginning portion of a phonation is disclosed. In the apparatus according to this related art reference, a sub-frame power calculating portion calculates the power of each of four sub-frames into which each frame is divided. A frame maximum power generating portion calculates the average value of the power of each sub-frame and the moving average of the power between adjoining two sub-frames, compares the moving average values of any sub-frames in the same frame, and selects the maximum moving average as the maximum power of the frame. Thus, even if a phonation starts from a later portion of a frame, the frame maximum power is prevented from being underestimated. Consequently, a voice presence state determining portion can securely determine that the current frame is a voice frame.
However, the related art references have the following disadvantages.
As a first disadvantage, if the voice presence/absence state changes in the middle of each frame, the frame cannot be accurately determined as a voice frame.
This is because the intensity of energy of a voice signal which will be a determination factor for the voice presence/absence state is calculated for each frame as the voice process.
As a second disadvantage, a frame that partly contains pulse noise may be determined as a voice frame.
This is because when the intensity of energy of the pulse noise is too large, the intensity of energy of the entire frame becomes larger than the voice presence/absence determination threshold value. Thus, the frame is determined as a voice frame.
SUMMARY OF THE INVENTION
In order to overcome the aforementioned disadvantages, the present invention has been made and accordingly, has an to provide a method and apparatus for accurately determining whether or not each frame is a voice frame even if a voice presence/absence state changes in the middle of the frame and even if each frame partly contains pulse noise.
According to a first aspect of the present invention, there is provided a method for detecting a voice presence/absence state of a frame which is obtained by dividing a voice signal into frames, comprising steps of: dividing the frame into sub-frames; calculating a physical amount of the voice signal in each sub-frame; and determining whether the frame is in a voice presence state or a voice absence state on the basis of a degree of variation of the physical amount among the sub-frames.
According to a second aspect of the present invention, there is provided a method for detecting a voice presence/absence state of a frame which is obtained by dividing a voice signal into frames, comprising steps of: dividing the frame into sub-frames; calculating a periodicity of the voice signal in each sub-frame; and determining whether the frame is in a voice presence state or a voice absence state on the basis of the periodicity of the voice signal in each sub-frame.
According to a third aspect of the present invention, there is provided a method for encoding a voice signal, comprising steps of: dividing a voice signal into frames: detecting a voice presence/absence state of each frame; encoding the voice signal for each frame; and determining whether to output the encoded voice signal for each frame; wherein the steps of encoding and determination are controlled by a result of the step of detection; and wherein the step of detection comprises steps of: dividing the frame into sub-frames; calculating a physical amount of the voice signal in each sub-frame; and determining whether the frame is in a voice presence state or a voice absence state on the basis of a degree of variation of the physical amount among the sub-frames.
According to a fourth aspect of the present invention, there is provided a method for encoding a voice signal, comprising steps of: dividing a voice signal into frames: detecting a voice presence/absence state of each frame; encoding the voice signal for each frame; and determining whether to output the encoded voice signal for each frame; wherein the steps of encoding and determination are controlled by a result of the step of detection; and wherein the step of detection comprises steps of: dividing the frame into sub-frames; calculating a periodicity of the voice signal in each sub-frame; and determining whether the frame is in a voice presence state or a voice absence state on the basis of the periodicity of the voice signal in each sub-frame.
These and other objects, features and advantages of the present invention will become more apparent in light of the following detailed description of a best mode embodiment thereof, as illustrated in the accompanying drawings.
REFERENCES:
patent: 5835889 (1998-11-01), Kapanen
patent: 5915234 (1999-06-01), Itoh
patent:
LandOfFree
Voice activity detection using the degree of energy... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Voice activity detection using the degree of energy..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Voice activity detection using the degree of energy... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3019523