Efficient scheduling of reading data from multiple storage...

Electrical computers and digital processing systems: memory – Storage accessing and control – Specific memory composition

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C711S158000

Reexamination Certificate

active

06282607

ABSTRACT:

FIELD OF INVENTION
The present invention relates generally to the field of data storage systems. More specifically, the invention relates to tape-based tertiary storage systems.
BACKGROUND OF THE INVENTION
A growing number of database applications require on-line access to large volumes of data. For example, telecommunication service providers store large amounts of phone call data to satisfy the requirements of billing, call statistics, and fraud detection. This information is accessed regularly by telecommunication providers. Other examples of users of on-line data access include digital libraries and video-on-demand servers.
Despite the rapid decline in magnetic disk prices, tape storage systems continue to be far more economical than disk storage systems (disk arrays) for storing large amounts of data. A tape-based tertiary storage system is the usual solution for a very large-scale storage system, which would otherwise require a large and impractical number of disk drives to contain the entire database. Tape-based systems are also cost-efficient and provide a wide variety of speed rates.
Even though tape-based systems have many advantages, a major disadvantage of tape systems compared to disk arrays is the high access delay, known as latency, for random retrievals. A typical tertiary storage system is made of several tape drives accessing a large pool of tapes. In high-end mass storage systems this usually takes the form of large “tape silos” that contain several drives and thousands of tapes. Robotic arms locate requested tapes and insert and remove tape from drives as needed. This tape-switching operation consumes a significant amount of time. The switching and positioning time in a randomly accessed tape-based system is measured in tens of seconds or minutes.
In addition to the latency resulting from the tape switching operation, locating and reading the information stored on tapes also consumes time.
Various scheduling processes have been developed to handle incoming data-block requests in an efficient manner. A conventional scheduling process is implemented by the combination of a major rescheduler, which chooses a tape and forms a retrieval schedule (service list) for the selected tape, and an incremental scheduler, which handles requests arriving after the formation of the retrieval schedule by the major rescheduler. The incremental scheduler either schedules the newly arriving requests at the instant they arrive, or defers them to a “pending list” until the next invocation of the major rescheduler. In a typical scheduling process, all the incoming requests that have not been scheduled are sent to the pending list. Next, a desired tape is selected and a retrieval schedule is formed by selecting requests from the pending list that are located on the selected tape. Conventional methods select tapes in an order, starting with the tape after currently mounted tape and then moving in a sequential order.
Three commonly used conventional scheduling processes are the null scheduling process, the static scheduling processes, the dynamic scheduling processes. A null scheduling process is based upon the “first in first out” principle which simply services requests in their order of arrival. This process gives reduced performance for random requests because the system may have to switch tapes many times to satisfy all of the requests, wasting valuable time. These multiple switches may include switches from a first tape to a second tape, and then back to the first tape, which is very inefficient.
The static scheduling processes freeze the service list once the major rescheduler has selected a tape and formed the service list. During the execution of the service list, newly arriving requests are appended to the pending list. Once the service list for the currently selected tape has been executed, then the requests remaining on the pending list are evaluated, a new tape is selected, and the process continues until all pending requests have been executed. This results in inefficient operation since during a sweep of the tape head, every newly arriving request is deferred to the pending list, even if the request is for a block that the tape head will pass over during the sweep.
Dynamic scheduling processes use a dynamic incremental scheduler that can insert a newly arriving request into the service list at the moment it arrives and takes advantage of forward (e.g., left-to-right) and reverse (e.g., right-to-left) locate phases of a locate operation. During the forward phase, any request for a data block on the currently loaded tape is executed. If the requested data block is at or beyond the position of the tape head (e.g., to the right of the tape head) at the time the request is received, it is placed into the forward locate phase of the service list and is read when the tape head reaches the data block. If the new request is for a data block on an area of the tape where the forward-moving tape head has already passed (e.g., to the left of the tape head), it is placed into the reverse phase of the service list and is read during the reverse phase. However, the tape head will not switch back to the forward phase to go back and read a newly-requested data block that it has already passed in the reverse locate phase. Such a request is deferred to a pending list.
All tape systems expend a significant amount of time in ejecting the old tape, putting away the old tape and retrieving a new tape, and loading the new tape into the drive. For example, in a system using an Exabyte EXB-8505 XL drive and an Exabyte EXB-210 tape library, this process can take 81 seconds.
Thus, there exists a need for a system which can decrease the access latency by minimizing the number of tape switches that need to be made and by globally monitoring all requests as they arrive, adjusting the schedule as needed, and optimizing the order in which the tapes are selected for reading.
SUMMARY OF INVENTION
The present invention is a dynamic process for improving the performance of a storage system and, in particular, a tape-based storage system, that takes a global view with regard to scheduling and tape selection. The speed of operation of the system is increased by analyzing all the requested data blocks located on multiple tapes within the storage system and, taking advantage of the existing replication of some of the requested data blocks on multiple tapes, forming a schedule that will read the requested data blocks efficiently from the multiple tapes.
In a preferred embodiment, first, a set of requests for a set of non-replicated data blocks is considered, and in particular, consideration is given to the requested non-replicated data blocks that are farthest from the beginning of each tape. The collection of these data blocks across all tapes defines a “primary envelope”—a set of data-block locations that must be traversed. This envelope may happen to contain some requested data blocks that are replicated at one or more locations on the tapes. Retrieving these data blocks does not require any additional portion of the tape to be traversed since they are located within the primary envelope. Then, to satisfy requests for data blocks that are not inside the primary envelope, the primary envelope is combined with an “upper envelope” that encompasses these requests. The combined areas of the primary envelope and upper envelope form an “extended envelope”.
The current invention also proposes three new tape selection methods. Further, the inventive system may utilize improved processes for data placement and data replication. The data placement and replication method capitalizes on skewed data access patterns. As data access skews increase (i.e., as the distinction between frequently accessed and less-frequently accessed data increases), the performance improves when the present invention is implemented. By replicating “hot” (frequently requested) items onto multiple tapes, the number of tape switches may be reduced significantly which results in increased performance.


REFERENCES:
patent: 5287459 (1994-02-01), Gn

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Efficient scheduling of reading data from multiple storage... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Efficient scheduling of reading data from multiple storage..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Efficient scheduling of reading data from multiple storage... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2551524

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.