Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission
Reexamination Certificate
1999-07-12
2003-10-14
Banks-Harold, Marsha D. (Department: 2654)
Data processing: speech signal processing, linguistics, language
Speech signal processing
For storage or transmission
C704S222000, C455S074100
Reexamination Certificate
active
06633840
ABSTRACT:
The present invention consists in a method and a system for transmitting data on a speech channel, in particular in the field of mobile telephony. However, it could also be used in the field of networks of fixed telephones, known as switched networks. The invention solves problems associated with the particular features associated with data transmission which takes place on a speech channel with the data in the speech channel being transcoded in a manner corresponding to encoding speech in a network.
BACKGROUND OF THE INVENTION
Speech transmission channels and data transmission channels are known in themselves in the field of mobile telephony. Data transmission channels require encoding different from speech encoding. They use network plant that is specific to data mode. In practice, a special contract must be entered into with a mobile telephony operator for this purpose. This provides access to point-to-point transmission of data in circuit-switched mode at a bit rate of 9 600 bit/s.
In the field of GSM cellular telephony, there are data transmission means using signaling channels of the cellular system. A distinction is drawn between SMS (Short Message Service) channels which can transmit at up to 300 bit/s and USSD (Unstructured Supplementary Service Data) channels which can handle bit rates in the order of 800 bit/s. The bit rate is low in both cases. In the case of USSD channels, the information is transmitted only from a user to the network. In the case of SMS channels, the information can be exchanged user to user or from the network to a user and is billed per packet exchanged, the cost at present being high.
The aim of the invention is to enable data of any kind to be transmitted over a network, in particular a mobile telephony network, at a high bit rate and without having to enter into an additional contract. In particular, the invention makes use of Internet access services. It also enables a manufacturer to update and maintain terminals.
What distinguishes speech encoding from data encoding, in particular for transmission in mobile telephony, is essentially the nature of the digitized data representing the speech. Speech digitized in a simple way produces a vast amount of digital data. In the context of mobile telephony, particular types of speech encoding have been developed to prevent the transmission channel frequency congestion that would result from excessively high data bit rates.
These particular types of encoding, known as source encoding, consist in principle in seeking characteristics representative of how speech is produced. These characteristics include three magnitudes, namely:
a fundamental frequency (pitch) corresponding to the vibration of the vocal chords,
filtering corresponding to modification of the fundamental vibration and resulting from the propagation of the vibration in the speech system, i.e. the larynx, pharynx and mouth, and
an excitation (or error) corresponding to a residue of the preceding modeling of the speech uttered.
A GSM source encoder establishes best values of these three types of magnitude from a PCM (Pulse Code Modulation) signal. A PCM signal is produced by sampling a speech signal at a frequency of 8 000 Hz and quantizing it on 13 bits, for example. The bit rate of the PCM signal is therefore 104 kbit/s in this example. The source encoder performs an operation known as analyzing or encoding the PCM signal.
The remainder of the description refers to a GSM network and transmission of speech when the source encoding is of the “Full Rate” type (ETSI recommendation SMG 6.10). The principles of the invention are nevertheless applicable to other forms of source encoding, or speech formats, in the GSM network (Half-Rate or Adaptive Multi-Rate).
They are also applicable to other mobile telephone networks (DCS-1800, PCS, etc.).
FIG. 1
a
shows the source encoding of the corresponding PCM signal for a 20 ms frame of a speech signal. This source encoding includes generating 36 bits of a pitch signal (corresponding to a long-term prediction), generating 36 bits of a filter signal and generating 188 bits of an excitation signal, for example. The 36 bits of the filter signal correspond to eight coefficients of a short-term linear prediction filter. The 188 bits of the excitation signal correspond to 60 excitation parameters.
At the receiving end, a synthesizing encoder receives corresponding streams of 260 bits per 20 ms period (and thus at a bit rate of 13 kbit/s). This synthesizing encoder includes programmable filters in cascade. A long-term first filter receives the excitation signals and filters them with filter values corresponding to the 36 bits of the pitch signal. A short-term second filter connected downstream of the first filter filters the resulting signal with filter values corresponding to the 36 bits of the short-term filter signal. Like the original PCM signal, the reconstructed signal has a bit rate of 104 kbit/s.
All of the processing shown in
FIG. 1
a
is effected repetitively. The period of this repetition is 20 ms in the currently-applicable standard. A stream of 260 bits which represent the parameters of the three magnitudes must be produced in each period of this repetition. In the aforementioned standard there are 260 bits to be transmitted every 20 ms, which corresponds to a bit rate of 13 kbit/s.
The source encoding includes the conversion of an analog amplitude (the level of the pressure wave representative of the sound) into three types of magnitude. The first magnitude represents the fundamental frequency or pitch and this parameter is routinely known as the Long Term Prediction (LTP). This first LTP magnitude is encoded in 5 ms sub-frames (four sub-frames per 20 ms) and 9 bits are encoded in each subframe, representing a total of 36 bits per 20 ms frame. The LTP pitch magnitude and the 9 bits encoded each time corresponding to two components: a delay or lag (encoded on 7 bits) defining a pitch period or delay size of the long-term prediction filter and an amplitude (encoded on 2 bits) defining an optimum coefficient of the long-term prediction filter.
The eight coefficients of the short-term filter are expressed in a transformed system called the Log Area Ratio (LAR) or coefficients: LAR
1
to LAR
8
. These coefficients are quantized with variable dynamic ranges depending on their size or their associated energy. Thus, two first coefficients LAR
1
and LAR
2
of the short-term filter are quantized on 6 bits. The next two coefficients LAR
3
and LAR
4
are assigned a dynamic range of 5 bits. The next two LAR
5
and LAR
6
are assigned a dynamic range of 4 bits and the last two LAR
7
and LAR
8
are assigned a dynamic range of 3 bits. In practice, 36 bits are allocated in this way to the representation of the short-term filter.
In the 260 bits transmitted, the remaining 188 bits (260−36−36) are used to encode the 60 excitation or RPE (Regular Pulse Excitation) parameters. The RPE is calculated, like the pitch signal, in four sub-frames each corresponding to 40 samples (5 ms). The four RPEs calculated in this way are each described in the form of regularly spaced grids with a pitch of three at the initial sampling frequency of 8 kHz. Each grid is described by 15 RPE parameters, namely:
an RPE grid position, encoded on 2 bits,
an amplitude on the sub-frame, encoded on 6 bits, and
thirteen coefficients describing a relative amplitude of each pulse of the grid (RPE pulses), each encoded on 3 bits.
When a digital message of this kind is encoded in this way, it is channel encoded, when transmitted, so that it can be transmitted on a radio channel subject to a high transmission error rate. The form of channel encoding applied in GSM telephony comprises the following steps shown in
FIG. 1
b
. The first step is concerned with systematic classification of the bits into three categories according to their sensitivity to errors as established by the standard:
class
1
a
: 50 bits, highly sensitive,
class
1
b
: 132 bits, sensitive, and
class
2
: 78 bits, insensitive.
This classification is defined in GSM
Bonnard Pierre
Varaldi Jean
Alcatel
Azad Abul K.
Banks-Harold Marsha D.
Sughrue & Mion, PLLC
LandOfFree
Method and system for transmitting data on a speech channel does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and system for transmitting data on a speech channel, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and system for transmitting data on a speech channel will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3157519