Multiplex communications – Pathfinding or routing – Switching a message which includes an address header
Reexamination Certificate
1999-11-29
2003-09-23
Yao, Kwang Bin (Department: 2664)
Multiplex communications
Pathfinding or routing
Switching a message which includes an address header
C370S469000
Reexamination Certificate
active
06625149
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates generally to computer systems, and more particularly to techniques for implementing communication protocols in operating systems.
BACKGROUND OF THE INVENTION
FIG. 1
illustrates how a FreeBSD operating system processes Internet protocol (IP) packets received from an Ethernet network. This protocol processing organization is referred to as the “BSD approach” because it is derived from that used in the Berkeley Software Distribution (BSD) Unix operating system. With minor variations, the organization shown in
FIG. 1
is found in many other operating systems, including cases of protocol families other than IP and networks other than Ethernet.
In the BSD approach, incoming packets are processed at software interrupt level, at a priority higher than that of any application. Input protocol processing is not scheduled and is charged to the interrupted application, even if that application is unrelated to the received packets. This leads to two undesirable consequences. First, high receive loads, e.g., due to a server “hot spot” or a denial-of-service attack, can make the system unable to process any application. This is the so-called “receive livelock” problem as described in, e.g., J. Mogul and K. K. Rarnakrishnan, “Eliminating receive livelock in an interrupt-driven kernel,” Proceedings of Annual Tech. Conf., USENIX, 1996. Second, because protocol processing of received packets is unscheduled, the system cannot enforce CPU allocations and thus cannot provide quality of service (QoS) guarantees to applications.
As shown in
FIG. 1
, in FreeBSD, arrival of an IP packet causes a hardware interrupt that transfers central processing unit (CPU) control to a network interface driver
10
. The driver
10
retrieves the packet from the corresponding network interface hardware, prepares the hardware for receiving a future packet, and passes the received packet to an ether_input routine
12
. The ether_input routine
12
places the packet in an IP input queue
14
without demultiplexing, i.e., all IP packets go into the same input queue
14
. The ether_input routine
12
then issues a network software interrupt. This software interrupt has a priority higher than that of any application, but lower than that of the hardware interrupt.
FreeBSD handles the network software interrupt by dequeuing each packet from the IP input queue
14
and calling an ip_input routine
15
. The ip_input routine
15
performs a checksum on the packet's IP header and submits the packet to preliminary processing operations such as, e.g., firewalling
16
and/or network address translation (NAT)
18
, if configured in the system, and IP options
20
, if present in the packet header. This preliminary processing may drop, modify, or forward the packet.
The ip_input routine
15
then checks the packet's destination IP address. If that address is the same as one of the host's addresses, the ip_input routine
15
jumps to its ip_input_ours label
21
, reassembles the packet, and passes the packet to the input routine of the higher-layer protocol selected in the packet header, e.g., transmission control protocol (TCP) input routine
22
-
1
, user datagram protocol (UDP) input routine
22
-
2
, IP in IP tunneling (IPIP) input routine
22
-
4
, resource reservation protocol (RSVP) input routine
22
-
5
, Internet group management protocol (IGMP) input routine
22
-
6
, Internet control message protocol (ICMP) input routine
22
-
7
, or, for other protocols implemented by a user-level application, raw IP (RIP) input routine
22
-
3
. Otherwise, if the destination is a multicast address, the ip_input routine
15
submits the packet to a higher-layer protocol, for local delivery, and to the ip_mforward routine
24
, if the system is configured as a multicast router. Finally, if the destination IP address matches neither one of the host's addresses nor a multicast address, and the system is configured as a gateway, the ip_input routine
15
submits the packet to the ip_forward routine
26
; otherwise, the ip_input routine
15
drops the packet. The ip_mforward routine
24
, ip_forward routine
26
, and one or more of the routines
22
-
1
may make use of the ip_output routine
27
.
The TCP and UDP input routines
22
-
1
and
22
-
2
, respectively, checksum the packet and then demultiplex it. These routines find the protocol control block (PCB) that corresponds to the destination port selected in the packet header, append the packet to the respective socket receive queue
28
, and wake up receiving processes
29
that are waiting for that queue to be non-empty. However, if the socket receive queue
28
is full, FreeBSD drops the packet.
Protocol processing of a received packet in FreeBSD is asynchronous relative to the corresponding receiving processes
29
. On a receive call, a receiving process
29
checks the socket receive queue
28
. If the queue is empty, the receiving process sleeps; otherwise, the receiving process dequeues the data and copies it out to application buffers.
The BSD approach to protocol processing of received packets has two main disadvantages. First, it is prone to the above-mentioned problem of receive livelock. Because demultiplexing occurs so late, packets destined to the host are dropped only after protocol processing has already occurred. Applications only get a chance to run if the receive load is not so high that all CPU time is spent processing network hardware or software interrupts. Second, even at moderate receive loads, process scheduling may be affected by the fact that the CPU time spent processing network interrupts is charged to whatever process was interrupted, even if that process is unrelated to the received packets. Such incorrect accounting of CPU usage may prevent the operating system from enforcing CPU allocations, thus causing scheduling anomalies.
An alternative protocol processing organization, lazy receiver processing (LRP), is illustrated in
FIGS. 2A and 2B
. LRP is described in detail in P. Druschel and G. Banga, “Lazy receiver processing (LRP): a network subsystem architecture for server systems,” Proceedings of OSDI'96, USENIX, 1996. Instead of the single IP input queue
14
of the above-described BSD approach, LRP uses separate packet queues referred to as channels, with one channel
30
-i associated with each socket i. LRP employs early demultiplexing, that is, the network interface hardware, or the network interface driver
10
and the ether_input routine
12
, examine the header of each packet and enqueue the packet directly in the channel that corresponds to the header, e.g., channel
30
-
1
in
FIG. 2A
or channel
30
-
2
in FIG.
2
B. Following a hardware interrupt, LRP wakes up the processes that are waiting for the channel to be non-empty. However, if the given channel is full, the network interface drops the packet immediately, before further protocol processing.
The LRP approach handles TCP and UDP packets differently. In the UDP case, illustrated in
FIG. 2B
, the receiving process
32
-
2
performs the following loop while there is not enough data in the socket receive queue
34
-
2
: While the corresponding channel
30
-
2
is empty, sleep; then dequeue each packet from the channel
30
-
2
and submit the packet to the ip_input routine
15
, which calls the udp_input routine
22
-
2
, which finally enqueues the packet in the socket receive queue
34
-
2
. The receiving process
32
-
2
then dequeues the data from the socket receive queue
34
-
2
and copies it out to application buffers. Therefore, for UDP packets, LRP is synchronous relative to the receiving process's receive calls.
In the TCP case, illustrated in
FIG. 2A
, LRP is asynchronous relative to the receiving process
32
-
1
. LRP cannot be synchronous relative to the receiving process
32
-
1
in the TCP case because (1) LRP was designed to be completely transparent to applications, and (2) in some applications, synchronous protocol processing could cause large or variable delays in TCP acknowledgements, adverse
Brustoloni Jose Carlos
Gabber Eran
Silberschatz Abraham
Lucent Technologies - Inc.
Yao Kwang Bin
LandOfFree
Signaled receiver processing methods and apparatus for... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Signaled receiver processing methods and apparatus for..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Signaled receiver processing methods and apparatus for... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3062598