Adaptive speech rate conversion without extension of input...

Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S215000

Reexamination Certificate

active

06374213

ABSTRACT:

TECHNICAL FIELD
The present invention relates to a speech speed converting method and a device for embodying the same which are able to achieve easiness of hearing expected in speech speed conversion without extension of playback time in various video devices, audio devices, medical devices, etc. such as a television set, a radio, a tape recorder, a video tape recorder, a video disk player, a hearing aid, etc.
The present invention also relates to a speech interval detecting method and a device for embodying the same which are able to discriminate between speech intervals and non-speech intervals of an input signal in the event that the speech which is delivered together with noises or background sounds in a broadcast program, a recording tape, or a daily life is processed to change height of the voice or speech speed, the meaning of the speech is mechanically recognized, the speech is coded to transfer or record, or the like.
[Outline of the Invention]
The present invention relates to a speech speed converting method and a device for embodying the same which converts a speech speed in real time by processing the speech made by the human being, and carries out a series of processes without omission of information, while monitoring always a data length of the input speech, an output data length calculated previously according to a conversion function, which is concerned with a previously given scaling factor, and a data length of the speech being output actually in constant process unit when a delivered speed (speech speed) of listening speech is made slow.
Furthermore, in the speech speed converting method and the device for embodying the same, for example, the non-speech interval which has a length in excess of a variable threshold value being set according to a delay degree (conversion factor) expected in speech speed conversion can be reduced appropriately while aiming at minimizing the time difference between the image and the speech caused by extension of the speech in watching the television receiver, and maximum slowness impression which can be accomplished within a decided time range can be created automatically by changing adaptively a conversion factor according to a degree of time difference between the input data length and the output data length, while keeping substantially a speaking time of the converted speech within a speaking time of an original speech.
Moreover, the present invention calculates the power of input signal data at a predetermined time interval in frame unit having a predetermined time width, and then discriminates between the speech interval and the non-speech interval every frame by using the threshold value for the power which is changed according to the maximum value and the difference between the maximum value and the minimum value, while holding the maximum value and the minimum value of the power within the past predetermined time period, so as to respond sequentially to change in respective powers of the input speech and the background sound. As a result improvement in quality of processed sound, improvement in the speech recognition rate, increase in the coding efficiency, and improvement in quality of the decoded speech can be achieved by detecting precisely the speech interval of the input signal in the case that changed in height of the voice or speech speed, mechanical recognition of the meaning of the speech, and coding of the speech to transfer or record, and the like are effected by processing the speech which is delivered together with noises or background sounds in a broadcast program, a recording tape, or a daily life.
In addition, the speech processing can be executed in real time while shortening a calculation time and also reducing a cost, by employing only the power which can be derived relatively simply as a feature parameter.
BACKGROUND ART
In case the speech speed converting method is applied to the actual broadcast, there are some cases where delay from the original speech such as emergency news becomes an issue. Particularly, it is possible that this delay has a bad effect on the visual media in contrast with the effect expected in the speech speed conversion.
Therefore, as approaches for achieving the speech speed converting effect (slowness impression) without delay from the original speech, there have been reported the method of suppressing extension in time by changing the speech speed from slowly to quickly as a function of a lapse time from a start point of one breath speech to an end point instead of uniformly slow conversion, and then reducing appropriately the non-speech interval between sentences (R. Ikezawa et al., “An Approach for Absorbing Extension in Time Caused in Speech Speed Conversion”, Spring Conference, Japanese Acoustic Society, 2-6-2, pp.331-332, 1992), the method of achieving this approach in real time (A. Imai et al., “Real Time Absorption Method for Extension in Time Caused in Speech Speed Conversion”, in International Conference, IEICE, D-694, pp 300, 1995), etc.
The former sets an appropriate function manually under that assumption that all speech styles have been known. The latter also sets a function defining a factor manually, and fixes this function after the function has been set once.
In addition, only the constant remaining time is set manually to reduce the non-speech interval. If a deal of “inconsistency” is integrated, the extended speech being accumulated in a buffer is cleared manually.
Therefore, in the speech speed converting device in the prior art, there has been such a problem that, since various speaking styles (speech speed, “timing” in speech, etc.) are present in the broadcast speech according to the speaker and also appropriate parameters must be set manually respectively, the device has many operation points, setting per se is difficult, and it is difficult for the common user to handle the device.
Besides, in the above speech speed converting device, the speech interval and the non-speech interval must be recognized separately. There are various systems as the speech interval detecting system in the prior art.
As one of the speech interval detecting system in the prior art, such a system has been known that a noise level and a speech level are calculated based on the power of the speech signal, etc., then a level threshold value is set based on the calculation result, then this level threshold value and the input signal are compared with each other, then the interval is decided as the speech interval if the level of the input signal is higher than the level threshold value and the interval is decided as the non-speech interval if the level of the input signal is lower than the level threshold value.
As methods of setting the level threshold value employed in this system, there are first to third representative systems. According to the first system, a value which is obtained by adding a preselected constant to a noise level value of the input speech is employed as the level threshold value. According to the second system which is an improved first system, the level threshold value is set to a relatively large value when a value obtained by subtracting the noise level value from a maximum level value of the input speech signal is large, whereas the level threshold value is set to a relatively small value when the value obtained by subtracting the noise level value from a maximum level value of the input speech signal is small (for example, Patent Application Publication (KOKAI) Sho 58-130395, Patent Application Publication (KOKAI) Sho 61-272796, etc.).
According to the third system, in addition to these level threshold value setting methods, the input signal is monitored continuously, then the input signal is regarded as the noise level when the level of the input signal is steady over a constant time period, and then a threshold value employed for the speech interval detection is set while updating the noise level sequentially (Proceeding in International Conference, IEICE, D-695, pp 301, 1995).
However, in the above speech interval detecting system in the

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Adaptive speech rate conversion without extension of input... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Adaptive speech rate conversion without extension of input..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Adaptive speech rate conversion without extension of input... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2895406

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.