Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
1999-09-03
2001-12-25
Amsbury, Wayne (Department: 2171)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000, C707S793000
Reexamination Certificate
active
06334123
ABSTRACT:
TECHNICAL FIELD OF THE INVENTION
This invention relates to relational processors, and more particularly to processors which perform logical operations on bit streams.
BACKGROUND OF THE INVENTION
With the proliferation of computer-based data systems in all facets of business, techniques for efficiently handling the potentially large amounts of digital information are becoming increasingly important in a variety of communications and electronic data storage applications. For example, enhanced methods of converting, storing, and searching large databases to retrieve the information may be critical to using such a system. Typically, large databases are structured to reduce the search time associated with finding records in such databases. To expedite large database searches, keys arranged in ordered indices of B-trees may be provided which point to the physical location of each record. This method is much more efficient that a linear approach of searching the database from the beginning to the end when the desired record may happen to be stored near the end of the database.
Additionally, physical data compression techniques are used to reduce hardware costs, data transfer times, and system storage space. Compaction algorithms are especially attractive where large files such as scanned images are stored. Transmission of such large uncompressed files not only displaces available bandwidth, but also requires significantly more storage space. However, a compression/decompression algorithm which is cumbersome to implement may actually offset any gains obtained by compressing the information in the first place. Similarly, when studying the scanning device itself, large amounts of data and respective transmission speeds become important design problems. For example, a facsimile machine scans a document with electro-optical devices line-by-line to generate the electrical data for transmission. However, the amount of data generated from one page in a document can be very large. A sheet of paper the size of A4 may scan to approximately 2 million bits of data which are required to be transmitted and received. Therefore different methods of transmitting such large files of information have been sought for more efficient and faster transmission of facsimile information. Run-length compression is a popular data compression technique which provides significant data compression for repeating characters or patterns. It uses very simple compression and decompression algorithms. Most run-length compression schemes are usually based on Huffman entropy coding techniques. A Huffman code is a lossless data compression algorithm which uses a small number of bits to encode common characters. Huffman coding approximates the probability for each character as a power of ½ to avoid complications associated with using a nonintegral number of bits to encode characters using their actual probabilities. The Huffman algorithm converts characters into bit strings using a binary tree containing all possible characters. The Huffman code for a character may be obtained by traversing the tree, where if a left branch is chosen the bit is 0; if a right branch is taken the bit is 1. Huffman compression is a statistical data compression technique which gives a reduction in the average code length used to represent the symbols of a alphabet. A Huffman code can be made by (1) ranking all symbols in order of probability of occurrence, (2) successively combining the two symbols of the lowest probability to form a new composite symbol, eventually building a binary tree where each node is the probability of all nodes beneath it, and (3) tracing a path to each leaf, noticing the direction at each node.
It can be shown mathematically that Huffman coding will give an optimum compression factor based on the symbol frequency distribution (entropy). However, Huffman coding does suffer from a key drawback-two passes through the data file are required. The first pass through the data file collects the frequency of occurrence for each run length for both streams of ones or zeros. With the list of the occurrence frequencies, a variable-length code set is developed to “remap” the input file. The second pass applies the remap codes to the data file creating a new compressed file. The two-pass approach requires that a conversion key be stored with the compressed data. The required two passes through the input file represents a serious impediment to high throughput computing.
Furthermore, recursive operations on bit streams (e.g., database threads) are very advantageous in arriving at a final search result. However, recursive operations require that the intermediate results (also called an intermediate vector) of a partial Boolean operation be kept locally (e.g., stored in a memory buffer) for reuse in the generation of another partial or final Boolean operation. (The binary bit stream may be compressed or uncompressed.) The processing of a binary bit stream is serial in nature. Thus a first-in/first-out (FIFO) device is a logical choice for the memory buffer. A FIFO can be loosely described as a data “pipe” that flows in one direction from the input to the output, and can hold a specific amount of information bits.
A requirement of the FIFO for use in the recursion process is that it have two alternating memory (also called “ping-pong”) buffers. Ping-pong buffers alternate respective functions in the processing and retention of intermediate data stream results. For example, if buffer “A” is collecting the current processing results and buffer “B” is feeding its output as input to the Boolean processor from the last iteration, then once processing is complete for the current iteration, the buffers will reverse roles, where buffer “A” is the input to the Boolean processor and “B” is storing the results. It can be seen that the buffers will alternate or ping-pong.
A final requirement for the memory buffer is that it must be large enough to hold the binary streams associated with the threads from a large database. The semiconductor industry has developed numerous FIFO chip solutions. However, classical FIFOs are optimized for speed and not for memory size. This is primarily due to the popular use as elastic buffers for disk and high speed communications systems. The great advantage in using these “off-the-shelf” FIFOs is that all the elements for the FIFO are contained in one integrated circuit. The FIFO integrated circuits are also cascadeable so that larger buffer sizes can be created. Unfortunately, the largest size of a classical FIFO (e.g., 64 KB) is insufficient for use with the disclosed relational engine. The disclosed architecture requires at least 16 MB for the buffer. Therefore a hybrid solution is required.
SUMMARY OF THE INVENTION
The invention disclosed and claimed herein is a method of relational processing where a query of one or more databases of records returns the query result in the form of one or more threads of parallel words. A collection generator generates one or more intermediate vectors of serial bit streams from the one or more threads, and processes the one or more intermediate vectors with a relational processor. The relational processor is operable to perform one or more logical operations thereon in a single pass, and outputs a contiguous serial bit stream.
In another aspect of the disclosed embodiment, a method accessing information in a database of records is disclosed by querying the database of records using one or more key fields, and constructing a binary tree based upon the key fields, each binary tree having associated therewith one or more nodes representative of key search criteria of the key fields. Each node has associated therewith a thread having a list of indexes of the records matching the key search criteria. A relational processor processes the thread in a single pass with a relational engine to output a serial bit stream of a collection whose bit positions represent the records which meet the key search criteria.
In still another aspect of the disclosed embodiment, a relational processin
Proulx Merle Thomas
Ross Jay Bruce
Amsbury Wayne
Howison, Chauza, Thoma, Handley & Arnott, L.L.P.
Whamtech, Inc.
LandOfFree
Index relational processor does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Index relational processor, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Index relational processor will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2590129