Electrical computers and digital processing systems: multicomput – Computer-to-computer protocol implementing – Computer-to-computer data streaming
Reexamination Certificate
1999-12-02
2004-06-08
Etienne, Ario (Department: 2153)
Electrical computers and digital processing systems: multicomput
Computer-to-computer protocol implementing
Computer-to-computer data streaming
C714S752000
Reexamination Certificate
active
06748441
ABSTRACT:
TECHNICAL FIELD
This invention relates to distribution of data files and other data objects using IP multicast techniques in conjunction with forward error correction and data carousel techniques. In particular, the invention relates to methods of receiving, buffering, and decoding data objects distributed in this manner.
BACKGROUND OF THE INVENTION
The existence and popularity of the Internet has created a new medium for software distribution. As this distribution method becomes more widely used, it will place more and more demands on Internet bandwidth. Thus, it will be important to distribute files and other data objects as efficiently as possible.
Currently, data objects are distributed to individual network clients upon request. When a data object is requested, it is packaged in a plurality of IP (Internet Protocol) packets and transmitted to the requesting client. If another client requests the same data object, the IP packets are re-transmitted to that client. Thus, each request results in a full re-transmission of the entire data object over the network.
This type of data distribution is very inefficient. The inefficiencies become serious in certain situations where there is a rush to obtain a particular data object that has only recently become available. This situation has been dubbed the Midnight Madness problem because the mad dash for files often takes place late at night or in the early morning when files are first made available. Spikes in Internet activity have been caused by a range of phenomena: popular product releases; important software updates; security bug fixes; the NASA Pathfinder vehicle landing on Mars; the Kasparov vs. Deep Blue chess match; and the Starr report. The danger of such traffic spikes lies not in the data type, but rather in the data distribution mechanism.
The Midnight Madness problem is caused by the Internet's current unicast “pull” model. A TCP (Transmission Control Protocol) connection is established between a single sender and each receiver, then the sender transmits a full copy of the data once over each connection. The sender must send each packet many times, and each copy must traverse many of the same network links. Naturally, the sender and links closest to the sender can become heavily saturated. Nonetheless, such a transmission can create bottlenecks anywhere in the network where over-subscription occurs. Furthermore, congestion may be compounded by long data transfers, either because of large files or slow links.
These problems can be alleviated through the use of IP multicast protocols. IP multicast is a method of distributing data in which the data is sent once from a data server and routed simultaneously to all requesting clients. Using this method, the sender sends each packet only once, and the data traverses each network link only once. Multicast has been commonly used for so-called “streaming” data such as data representing audio or video. Typically, multicast is used to transmit live events such as news conferences or audio from broadcast radio stations.
FIG. 1
shows a network system utilizing IP multicasting. The system includes a data server
10
and a plurality of clients
12
and
13
. The system also includes a plurality of routers
14
that route data along different communications links to the receiving clients. In this case, only the five clients referenced by numeral
12
have requested the data stream, while the clients referenced by numeral
13
have not requested the data stream. The data stream is forwarded to the requesting clients
12
, as indicated by the shaded arrows. However, the data stream is not forwarded to non-requesting clients
13
, thus preserving bandwidth on the links to those clients.
IP multicast provides a powerful and efficient means to transmit data to multiple parties. However, IP multicast is problematic for transfers of data objects which must be transmitted reliably, such as files. IP multicast provides a datagram service—“best-effort” packet delivery. It does not guarantee that packets sent will be received, nor does it ensure that packets will arrive in the order they were sent.
Many reliable file transfer protocols have been built on top of multicast. However, since scalability was not a primary concern for most of these protocols, they are not useful for the midnight madness problem. The primary barrier to scalability is that most of these protocols require feedback from the receivers in the form of acknowledgements (ACKs) or negative acknowledgements (NACKs). If many receivers generate feedback, they may overload the source or intermediate data links with these acknowledgements.
A so-called data carousel protocol can be used to provide scalable file distribution using multicast protocols. A data carousel is a simple protocol that avoids feedback from receivers. Using this protocol, a data server repeatedly sends the same data file using IP multicast. If a receiver does not correctly receive part of the file, the receiver simply waits for that portion of the file to be transmitted again.
Although a data carousel is workable, it often imposes a significant delay as the receiver waits for the next iteration of the file transmission. Forward Error Correction (FEC) can be utilized in conjunction with a data carousel to reduce the re-transmission wait time. Using FEC, error correction packets are included in the data stream. The error correction packets allow reconstruction of lost packets without requiring a wait for the next file transmission.
Using IP multicast, corrupted packets are automatically detected (using checksums) and discarded by the IP protocol. Accordingly, it is only necessary to replace lost packets. Therefore, the FEC protocol described herein deals only with erasure correction rather than with error correction, even though the broader terms “error correction” and “FEC” are used throughout the description.
Using forward error correction, a data object is broken into data blocks for transmission in respective IP packets. Assuming that there are k source blocks, these source blocks are encoded into n erasure-encoded blocks of the same size, wherein n>k, in a way that allows the original k source blocks to be reconstructed from any k of the erasure-encoded blocks. This is referred to as (n,k) encoding. Many (n,k) encoding techniques are based on Reed-Solomon codes and are efficient enough to be used by personal computers. See Rizzo, L., and Vicisano, L., “Effective Erasure Codes for Reliable Computer Communication Protocols”,
ACM SIGCOMM Computer Communication Review
, Vol. 27, No. 2, pp. 24-36, April 1997, and Rizzo, L., and Vicisano, L., “Reliable Multicast Data Distribution Protocol-Based on Software FEC Techniques”,
Proceedings of the Fourth IEEES Workshop on the Architecture and Implementation of High Performance Communication Systems
, HPCS'97, Chalkidiki, Greece, June 1997, for examples of an (n,k) encoding method. So-called Tornado codes are viable alternatives to Reed-Solomon codes.
It is desirable in many situations to utilize systematic (n,k) encoding, in which the first k of the n encoded blocks are the original data blocks themselves. If no blocks are lost during transmission, a receiver does not incur any processing overhead when decoding the k blocks of a systematic code. The methods described herein work with, but do not require, systematic encoding.
FIG. 2
shows how this scheme works. A data file in this example contains k blocks, indicated by reference numeral
20
. These k blocks are encoded in a step
21
using a Reed-Solomon encoding algorithm, resulting in n erasure-encoded blocks
22
, which are sent repeatedly in a step
23
using IP multicast. Each of the n erasure-encoded blocks is the same size as one of the original k blocks. The receiver waits until it has received any k of the erasure-encoded blocks (indicated by reference numeral
24
), and then decodes them in a step
25
to recreate the original k source blocks
26
.
In practice, k and n are limited when using Reed-Solomon-based codes, because encoding and decodin
Etienne Ario
Flynn Kimberly
Lee & Hayes PLLC
Microsoft Corporation
LandOfFree
Data carousel receiving and caching does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Data carousel receiving and caching, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Data carousel receiving and caching will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3294255