Data processing: speech signal processing – linguistics – language – Audio signal time compression or expansion
Reexamination Certificate
1999-09-28
2002-04-23
Dorvil, Richemond (Department: 2641)
Data processing: speech signal processing, linguistics, language
Audio signal time compression or expansion
C704S278000, C369S044320, C702S069000
Reexamination Certificate
active
06377931
ABSTRACT:
BACKGROUND
1. Field of the Invention
The present invention relates to communication systems and in particular to packet network communication systems.
2. Description of the Related Art
Currently, global and local communication systems are rapidly changing from switched network systems to packet network systems. Packet network systems transmit data, speech, and video. An example of a packet network is the Internet (a globally connected packet network system) or the Intranet (a local area packet network system). While speech communications in switched network systems is carried by a direct point-to-point connection, speech communications in packet network system is performed by packing speech frames and transmitting the frames over the network.
A number of applications for packet networks now exist. For example, in November 1996, the International Telecommunication Union (ITU) and the Telecommunication Standardization Sector (ITU-T) ratified the H.323 specification defines how delay-sensitive voice and video traffic is transported over local area networks. Earlier this year (1999), the ITU-T approved H.323 Revision 2 for use in wide area networks. However, operating H.323 terminals over a wide area network (such as the public Internet) may result in poor performance due to the lack of quality-of-service (QoS) guarantees in packet networks. In the Internet, congestion due to inadequate bandwidth often leads to long delays in the delivery of time-sensitive packets. For voice data, packets that are lost or discarded result in gaps, silence, and clipping in real-time audio playback.
To support a real-time QoS, a new Internet Protocol (IP) network has been proposed, called the Resource Reservation Protocol (RSVP). Using RSVP, both real time and non-real time applications can specify an appropriate QoS over the shared bandwidth of the Internet. However, until an RSVP standard is ratified and implemented in network routers, it is not possible for the end-to-end connections over IP networks to guarantee a QoS equivalent to the PSTN. In addition, IP telephony devices utilize Voice Over Internet Protocol (VOIP) over private and public carrier IP networks (rather than the public Internet) where ample bandwidth can be allocated.
Several drawbacks can jeopardize the quality of the speech transmitted by a packet network. The main drawback is the irregularity (or jitter) in the time of arrival of the packets. Since speech communications is a continuous process, each packet should be available at the receiving end in time for its usage (a packet is used by decoding its content and playing the decoded speech to the listener). A problem arises, for example, if a few packets are delayed at a node of the packet network. At the receiving end, since the speech packets have not arrived, the listener will experience a discontinuity in speech. Moreover, when the packets finally arrive to their destination, they might arrive too late to be used, and will be dropped. In this case, the listener will lose some of the speech information.
One possible solution for the irregular time of arrival of speech packets has been the buffering of several speech packets before using them to produce the speech. The speech packets are put in a FIFO (First-In-First-Out) buffer type, which holds several packets. Such a buffer is commonly called a jitter buffer. If the number of delayed packets is less than the size of the buffer, then the buffer will not become empty, and the listener will not experience speech discontinuity or lost. The greater the potential jitter, the larger the buffer has to be, in order to give more room for the playback of previous packets while waiting for the subsequent arrival of later packets. However, the intermediate buffer does introduce an overall delay that is proportional to the buffer size.
A large size jitter buffer can overcome several irregularities in packet arrival time, but results in intolerable delay, while a small size jitter buffer introduces only a small delay, but recovers only a limited level of packet time-of-arrival jitter. The proper jitter buffer size is a system design concern, which should be determined according to the allowable speech communications delay, the expected network delays, and the tolerable reduction in speech quality due to discontinuities and losses.
Packet loss leads to unpleasant signal degradation. Small amounts of packet loss have been dealt with in a number of manners. One solution has been to employ packet replay, where the receiver merely repeats the last packet to fill in the time until the next packet actually arrives. However, where packet loss may be more substantial, such as where a Voice Over Internet Protocol (VOIP) signal passes over the Internet, simple packet replay has not been effective.
Another solution to minimize delay caused by a jitter buffer has been to dynamically monitor the jitter and adjust the buffer size accordingly. Commonly assigned U.S. Pat. No. 5,699,481 proposes the management of a jitter buffer by tracking the current number of speech packets stored in the jitter buffer. When the buffer approaches its full capacity, packets are removed from the jitter buffer. When the buffer approaches its empty level, “artificial” packets are inserted into the jitter buffer.
SUMMARY OF THE INVENTION
In a speech communications network, continuous play of received audio packets is achieved using a jitter buffer in a receiver. Audio packets are first temporarily stored in the jitter buffer before decoding of the audio packets into an audible output. A consistent accumulation level of the received audio packets in the jitter buffer is maintained to provide continuous and synchronized output to a decoder. When the level of stored audio packets approaches the full capacity of the jitter buffer, the rate at which the audio packets are played out of the jitter buffer is increased. The increased output rate is achieved by compressing a portion of the stored audio packets to reduce the number of audio packets in the jitter buffer. When the level of stored audio packets approaches an empty level of the jitter buffer, the rate which the audio packets are played out of the jitter buffer is reduced. The reduced output rate is achieved by expanding a portion of the stored audio packets to increase the number of audio packets in the jitter buffer. Audio packets are not modified when the level of stored audio packets is within a predetermined range, such that the rate of incoming audio packets received by the jitter buffer approximately equals the rate of decoded audio packets. A speed controller is then provided to instruct the decoder to decode the audio packets from the jitter buffer according to either a compressed, expanded or normal audio packet status.
REFERENCES:
patent: 5694521 (1997-12-01), Shlomot et al.
patent: 5699481 (1997-12-01), Shlomot et al.
patent: 5825771 (1998-10-01), Cohen et al.
patent: 5881245 (1999-03-01), Thompson
patent: 5953695 (1999-09-01), Barazesh et al.
patent: 6212206 (2001-04-01), Ketcham
Overview of Speech Packetization, M.H. Sherif and A. Crossman, AT&T Bell Laboratories, © 1995 IEEE, pp. 296-304.
Ansari et al (“Compressed Voice Integrated Services Frame Relay Networks: Voice Synchronization,” Conference on Electrical and Computer Engineering, p. 1073-1076 vol. 2, Sep. 5-8, 1995).
Akin, Gump, Strauss, Hauer & feld, LLP
Dorvil Richemond
Mindspeed Technologies
Nolan Daniel A.
LandOfFree
Speech manipulation for continuous speech playback over a... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Speech manipulation for continuous speech playback over a..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Speech manipulation for continuous speech playback over a... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2904316