Complexity resource manager for multi-channel speech processing

Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S211000

Reexamination Certificate

active

06789058

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates generally to speech and audio signal processing. More particularly, the present invention relates to complexity resource management for multiple channel speech and audio signal processing.
2. Related Art
In recent years, packet-based networks, such as the Internet, have begun to replace traditional telephone networks (i.e., switched networks) for transportation of voice and data in accordance with voice-over-packet (“VoP”). The packetizing of voice signals for transmission over a packet network has been recognized as a less expensive, yet effective, alternative to traditional telephone service. For example, with the emergence of voice over IP (“VoIP”), telephone conversations may now be captured, packetized and transported over the Internet. Other examples of emerging VoP implementations include Next Generation Networks (“NGN”), which do not necessarily use the Internet Protocol (IP) for the transmission of packet voice.
In a conventional VoIP system, telephone conversations or analog voice may be transported over the local loop or the public switched telephone network (“PSTN”) to the central office (“CO”), where speech is digitized according to an existing protocol, such as G.711. From the CO, the digitized speech is transported to a gateway device at the edge of the packet-based network. The gateway device receives the digital speech and packetizes it. The gateway device can combine G.711 samples into a packet, or use any other compressing scheme. Next, the packetized data is transmitted over the Internet using the Internet Protocol for reception by a remote gateway device and conversion back to analog voice in the reverse manner as described above.
For purposes of this application, the terms “speech coder” or “speech processor” will generally be used to describe the operation of a device that is capable of encoding speech for transmission over a packet-based network and/or decoding encoded speech received over the packet-based network. As noted above, the speech coder or speech processor may be implemented in a gateway device for conversion of speech samples into a packetized form that can be transmitted over a packet network and/or conversion of the packetized speech into speech samples. Ordinarily, a gateway processor handles the speech coding of multiple channels.
Efforts have been made to increase the efficiency and operation of speech processors to encode speech for transmission over packet-based networks. One area of development has been in the area of speech codecs. For example, recent speech codecs, such as the adaptive multi-rate (AMR), the enhanced variable rate speech coder (EVRC), and the selectable mode vocoder (SMV), have been designed for a best tradeoff between bit-rate, complexity and quality for their designed applications. In order to provider better playback quality at a lower bit-rate, these modern codecs are generally more complex and therefore require more processing power than lower-complexity high-bit-rate speech codecs, such as G.711. As a result of the increased complexity of these codecs and the associated hardware requirements, the channel density (i.e., number of channels) that a speech processor (or gateway) can support is limited. Increasing the processing power of speech processors and gateways to handle higher complex codecs would involve a substantial increase in cost and investment. On the other hand, operating lower-complexity high-bit-rate codecs results in increased bit rate and reduced throughput over the communication channels. In addition, in accordance with certain communication standards, low-bit-rate complex coders are mandatory, and therefore use of lower complexity codecs is not possible.
Speech encoding algorithms executed by speech processors (and gateways) have also been enhanced to increase the efficiency and operation of the communication channel. In particular, variable rate codecs were introduced for packet networks, where the average load on the networks is an essential factor in their operation. According to these enhanced encoding algorithms, the bit rate used to encode a speech signal may be selected according to the input speech. For example, approximately fifty percent (50%) of conversational speech involves inactive speech (silence). Typically, higher complex encoders are used to encode active speech segments with a somewhat higher bit rate, while lower complexity encoders are used to process silence or background noise (inactive speech) segments at a lower bit rate. Although this solution is suitable for the network due to its performance being related to the average bit rate, the processing of these multi-channels of speech by a DSP is particularly challenging, since the throughput of a DSP is not defined by the average complexity, but by the maximum complexity. Although, on the average, a DSP may be able to handle all the channels, since at a given time some channels carry active speech—that need higher complexity algorithm—and others carry inactive speech—that need lower complexity algorithm, there may still be instances where a majority or all channels involve active speech and, thus, all such channels needing higher complexity algorithm, which together will exceed the available computation power of the DSP.
Accordingly, there is a need in the art for a speech coder apparatus and method, which overcomes these and other shortcomings of present implementations for encoding voice information into a packetized form that can be transmitted over a packet network.
SUMMARY OF THE INVENTION
In accordance with the purposes of the present invention as broadly described herein, there is provided a multi-channel speech processor for encoding speech for a packet network environment. In one illustrative aspect of the present invention, a complexity resource manager (CRM) is executed by a controller or processor. The CRM manages the level of complexity of the coding, which is used by a signal-processing unit (SPU) to convert the speech signal into packet data. In some embodiments, the CRM may also be used to manage the decoding operation as well. In general, the CRM determines the level of complexity of the coding based on a calculated complexity budget, where the complexity budget is determined based on the time consumed to process prior speech signal channels and the time available to process the remaining channels. In this way, the CRM is able to control the overall complexity of the speech processor, and adjust the speech processor to meet the complexity budget, through its ability to signal the SPU to encode and/or decode a speech signal in a complexity reduced coding mode based on the calculated or consumed complexity budget.
For example, the speech processor may use the SMV codec to encode speech signals for a plurality of channels
1
through m. The SMV codec may provide four coding rates, each rate having an associated level of complexity including: a full rate, a half rate, a quarter rate, and an eighth rate, for example. It is possible that the SMV full rate, the quarter rate, and the eighth rate schemes are less complex than the SMV half rate scheme due to the more intense search required to execute the half rate scheme. In this example, the CRM may choose a coding rate for a given channel “n”, based on the time spent processing channel
1
through n−1 and the available processing time left to process channels n through m. Thus, the CRM may select a lower level complexity rate (e.g., full rate, quarter rate, or eighth rate) to process a given speech signal channel n (or groups of channels “n+o”, where “n+o”≦m) where the calculated processing time left to process the remaining channels would not be sufficient to support a higher level complexity coding rate (e.g., SMV half-rate). It is noted that although described in terms of ordinal numbers n for channels
1
through m, the speech processor of the present invention may actually process speech signals for channels
1
through m in any order as input signals arr

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Complexity resource manager for multi-channel speech processing does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Complexity resource manager for multi-channel speech processing, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Complexity resource manager for multi-channel speech processing will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3238910

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.