Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
1999-04-02
2004-03-09
Vu, Kim (Department: 2172)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000, C707S793000, C707S793000, C707S793000
Reexamination Certificate
active
06704721
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates generally to automating data analysis tasks and, more particularly, to analysis tasks that require navigation between dynamic data that has dissimilar structures.
BACKGROUND OF THE INVENTION
The present invention provides for systems and methods for automatically navigating between dynamic data that has dissimilar structures. The term “dynamic” as used herein refers to frequent change with respect to data, a characteristic that affects the efficiency of navigation techniques. The term “dissimilar structure” as used herein refers to a data structure containing information that is not present in another data structure. Thus, it is said that the first data structure is dissimilar with respect to the second data structure. A problem in the management of distributed systems is described below to illustrate the prior art background. However, it is to be appreciated that the invention has broader applications.
Rapid improvements in both hardware and software have dramatically changed the cost structure of information systems. Today, hardware and software account for a small fraction of these costs, typically less than 20 percent (and declining). The remaining costs relate to the management of information systems, such as software distribution, providing help desk support, and managing quality of service (QoS).
Decision support is critical to the management of information systems. For example, in software distribution, we need to know: (i) which machines require software upgrades; (ii) what are the constraints on scheduling upgrades; and (iii) the progress of upgrades once installation has begun. In QoS management, decision support detects QoS degradations, identifies resource bottlenecks, and plans hardware and software acquisitions to meet future QoS requirements.
Accomplishing these tasks requires a variety of information, such as, for example, QoS measurements, resource measurements (e.g., network utilizations), inventory information, and topology specifications. Collectively, we refer to these information sources as data. Much of this data is dynamic. Indeed, measurement data changes with each collection interval. Further, in large networks, topology and inventory information change frequently due to device failures and changes made by network administrators.
We use the term “dataset” to describe a collection of data within the same structure. For example, a dataset might be organized as a relational table that is structured so that each row has the same columns. Here the data is structured into rows such that each row has a value for every column. A dataset contains multiple “data elements” (hereinafter, just elements), which are instances of data structured in the manner prescribed by the dataset (e.g., a row in a relational table). A group of elements within the dataset is called an “element collection.” An element collection is specified by a “collection descriptor” (e.g., SQL where-clause for a relational table or line numbers for a sequential file). A collection descriptor consists of zero or more “constraints” that describe an element collection. A constraint consists of an “attribute” (e.g., a column name in a relational table or a field in a variable-length record), a relational operator (e.g., =, <, >), and a value.
Due to the diversity of software tools, administrative requirements, and other factors, data is typically grouped into multiple datasets. Thus, decision support often requires navigating from an element collection in one dataset to one or more element collections in other datasets. We refer to these as the “source element collection,” “source dataset,” “target element collections” and “target datasets,” respectively.
With this background, we state one of the problems addressed by the present invention. We are given a source element collection and multiple target datasets. The objective is to find the target element collection that “best matches” the source element collection. By best matches, it is meant that the structure and content of the source element collection is the most similar to that of the target element collection.
To illustrate the problem addressed, we describe a scenario in QoS management. Considered is a situation in which end-users experience poor quality of service as quantified by long response times. The objective is to characterize the cause of long response times by: (i) when they occur; (ii) who is affected; (iii) which configuration elements are involved; and (iv) what components of the configuration element account for most of the delays.
The analyst starts with a dataset containing end-to-end response times. The dataset is structured into the following columns: shift, hour, subnet, host, user's division, user's department, user name, transaction issued, and response time. The analyst proceeds as follows:
Step 1. The analyst isolates the performance problem. This may be done in any conventional manner, such as, for example, is described in R. F. Berry and J. L. Hellerstein, “A Flexible and Scalable Approach to Navigating Measurement Data in Performance Management Applications,” Second IEEE Conference on Systems Management, Toronto, Canada, June, 1996. In the example, isolation determines that poor response times are localized to the element collection described by the constraints: shift=1, hour=8, subnet=9.2.15, division=25, department=MVXD, user=ABC, and transaction=_XX. At this point, the analyst has characterized when the problem occurs, who is affected, and which configuration elements are involved.
Step 2. To determine what components of the configuration element account for most of the delays, the analyst must examine one or more other datasets. After some deliberation and investigation by the analyst, the analyst selects a dataset of operating system (OS) measurements that are structured as follows: hour, minute, shift, subnet, division, department, efficiency, waiting time, CPU waits, I/O waits, page waits, and CPU execution times.
Step 3. The analyst selects the subset of the OS data that best corresponds to the response time data. Doing so requires dealing with two issues: (i) the source and target datasets are structured somewhat differently in that the first has transaction information (which the second does not), and the second reports time in minutes (which the first does not); and (ii) the second dataset does not have records for user ABC, the user for which a problem was isolated. To resolve the first problem, the analyst decides to use only the information common to both datasets. So, transaction information and minutes are ignored when navigating from the response time data to the OS data. The second problem is resolved by assuming that users within the same department are doing similar kinds of work. Thus, the target element collection is described by the constraints: shift=1, hour=8, subnet=9.2.15, department=MVXD, and user=ABC.
Step 4. The analyst uses the OS data to characterize the host component that contributes the most to response time problems. This characterization reveals that paging delays account for a large fraction of end-to-end response times.
Steps 1 and 4 employ similar problem isolation logic. Indeed, automation exists for these steps. Unfortunately, in the prior art, steps 2 and 3 are performed manually. As such, these steps impede the automation, accuracy and comprehensiveness of problem isolation. This, in turn, significantly increases management costs. The challenges raised by steps 2 and 3 above are modest if there are a small number of measurement sources. Unfortunately, the number of measurement sources is large and growing.
Disimilarities in the structure of datasets typically arise because measurements are specific to the measured entity. Hence, heterogeneous equipment means heterogeneous measurement sources. Heterogeneity includes the different types of devices (e.g., routers versus file servers), different vendors, and differen
Ly Anh
Ryan & Mason & Lewis, LLP
Vu Kim
Zarick Gail H.
LandOfFree
Systems and methods for automated navigation between dynamic... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Systems and methods for automated navigation between dynamic..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Systems and methods for automated navigation between dynamic... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3262478