Distributed speech coder pool system with front-end idle...

Multiplex communications – Channel assignment techniques – Arbitration for access to a channel

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C370S468000

Reexamination Certificate

active

06535521

ABSTRACT:

TECHNICAL FIELD
This invention relates to distributed modem pooling techniques for maximizing resource utilization in a packet-switched communication network with integrated telephony services, and more particularly, for maximizing resource utilization of the network during speech/voice processing.
BACKGROUND
Voice-over-IP (VoIP) is an emerging technology where packet-switched networks employ the Internet Protocol (IP) to offer telephony services. Examples of such networks are the Internet and private IP-based corporate local area and wide area networks (LANs and WANs). The main advantage of using this technology over Plain Old Telephone Service (POTS) is the cost savings obtained from its reduced bandwidth requirements. Unlike POTS/PSTN where a continuously available circuit-switched DSO (64 kbps) connection is dedicated to the call for its entire life, VoIP shares the network bandwidth with other information types like data and video. This is possible because speech in a VoIP network typically travels (is processed) at the low rate of either 6.3 kbps or 5.3 kbps. Therefore, for example, a corporation can use a portion of the bandwidth of its existing IP network to offer VoIP to its employees and, consequently, do away with their traditional phone services and their associated costs.
VoIP achieves its bandwidth efficiency through the use of two techniques: Speech Compression and Discontinuous Transmission (DTX). The former employs source encoding methods to represent the sampled voice signal in compressed form that can be decoded and decompressed later at the receiving end. The compression ratio achieved by this technique can be as high as 24:1, resulting in tremendous bandwidth reduction. In Discontinuous Transmission, bandwidth reductions are achieved by detecting silence in the phone conversation and, in response, either shutting down the transmitter on the non-speaking end or sending smaller frames than the regular speech ones. With DTX, the bandwidth can theoretically be halved since, ideally, in conversations one person is talking and the other is listening.
FIG. 1
conceptually illustrates the basic architecture of a typical VoIP WAN network
10
like the internet. An IP network
15
forms the backbone of the WAN network. Telephone internet gateways
20
,
25
,
30
connected to the IP network provide the telecommunications interface infrastructure which allows the exchange of information—such as video, text, and/or speech—in the form of one or more recognized protocols between telephony-capable peripheral devices attached thereto. For example, one more personal computers
35
connected to an associated telephone gateway
20
—using a modem
36
or the like equivalent device—are coupled to exchange information (including speech/voice information) therebetween or with other telephony devices (
41
-
44
) connected to the network
10
—over for example the Public Switched Telephone Network (PSTN)
40
—via their associated telephone gateways
25
,
30
. (Telephony devices
41
-
44
can include both wireless as well as wireline devices interfaced thereto in a conventional manner over the PSTN or a dedicated telephony-based network.) The PSTN or the like interface network receives and converts speech data into digital samples which are then communicated to the associated internet telephony gateway.
All digital processing of speech samples and associated control tasks are done by special hardware in each telephone gateway in response to appropriate analog (e.g., from modem
36
) or digital (e.g., DSO signals from the PSTN) speech samples originating in the form of spoken speech from such devices as PC
35
and telephony devices
41
-
44
. The spoken speech samples are received via actual physical layer links
37
connecting the outside world to the telephone gateways internal speech processing hardware. A diagrammatic depiction of such hardware is shown in FIG.
2
. The hardware includes a pulse code modulated (PCM) sample IO handler
50
, an auto gain control (AGC) and echo cancellation devices
55
, a voice coder/decoder (CODEC)
60
, a line coder
65
and an IP network interface device
70
configured to operate in a known fashion. A typical input to voice CODEC
60
, for example, might be an 8 kHz 16-bit linear PCM signal which corresponds to a data rate of 128 kbps (8000×16). In one implementation, CODEC
60
operates on blocks of 240 input samples
56
called frames, each frame having a duration of 30 msec (240/8000). The output of CODEC
60
is at one of two bit-rates: 6.3 kbps or 5.3 kbps. The higher rate has greater quality and the lower rate is, obviously, more bandwidth efficient. At 6.3 kbps, the length of the output codeword is 189 bits (0.03 sec×6300 bps), and at 5.3 kbps its 159 bits. The highest compression ratio is, therefore, achieved by the 5.3 kbps Codec where the input 128 kbps PCM signal is compressed
24
fold.
A block diagram of the operational processing logic of CODEC
60
is shown in FIG.
3
. Discontinuous transmission (DTX) and silence compression of input samples
56
are handled by a Voice Activity Detector (VAD)
61
and by Comfort Noise Generator (CNG)
62
. The VAD
61
reliably detects the presence or absence of speech and conveys that information to the CNG
62
. Although this information is passed on a frame by frame basis, the determination of the presence or absence of speech is made over multiple successive frames.
The CNG
62
creates a noise signal that matches the actual background noise. It essentially computes and encodes parameters that can be used at the receiving end to synthesize this artificial noise. These parameters constitute the Silence Descriptor (SID) frames
63
which use less bits (
40
) than the normal speech ones and are transmitted during inactive periods. This transmission, however, is not periodic. That is, for each inactive (non-speech) frame the CNG
62
makes a decision of sending a SID frame
63
or not based on variations of the power spectrum of the background noise. As long as this spectrum remains relatively unchanged, SID frames
63
stop getting sent and the system's transmitting modules remain idle. At the receiving end, on the other hand, the decoder always uses the last SID frame
63
received to generate the silence comfort noise.
In a typical CODEC, a speech coder
64
processes the speech portion of the PCM samples output from VAD
61
to generate—using appropriate coding algorithms—encoded speech frames
63
′. This processing dominates the horsepower requirements of a CODEC. Typical power consumption by coding algorithms might be 20 million instructions per second (MIPS). Comparatively, the requirements of the comfort noise generator
62
are negligible and do not exceed 1 MIPS. (The VAD
61
is immaterial to this discussion since it is common to the generation of both SID frames
63
and encoded speech frames
63
′.)
A CNG 1 MIPS estimation represents normal operations where SID frames
63
are not being continuously built and sent. For peak processing estimations, however, one might assume that the background noise's power spectrum is constantly changing and new SID frames must be computed and sent continuously. Exact numbers for this peak condition are not available but, nonetheless, one can safely estimate that they do not exceed a very generous 5 MIPS. To verify this, one can use the ratio of the number of bits in a SID frame
63
to the number of bits in a regular speech frame as a rough indicator. The minimum such ratio is for a 5.3 kbps coder which is 40/159=0.251. Multiplying this number with 20 MIPS gives us 5.03 MIPS as the horsepower needed to compute a SID frame
63
. The 20 and 5 MIPS estimates produce a 4:1 ratio between full-time and idle-time frame generation processing. (This very rough estimation assumes that SID frame generation is as complex as that of encoded speech frame generation and that the relationship between number of bits and MIPS is linear, which is not remotely the case. Nonetheless, given that the generation of comfort noise

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Distributed speech coder pool system with front-end idle... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Distributed speech coder pool system with front-end idle..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Distributed speech coder pool system with front-end idle... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3045967

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.