Method and system for improved enumeration of tries

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

[ 0.00 ] – not rated yet Voters 0 Comments 0

Details Method and system for improved enumeration of tries Method and system for improved enumeration of tries

: 1998-11-23
: 2001-10-16
: Black, Thomas (Department: 2171)
: Data processing: database and file management or data structures
: Database design
: Data structure types

: C707S793000, C704S010000
: Reexamination Certificate
: active
: 06304878
: ABSTRACT:

FIELD OF THE INVENTION
The invention relates generally to computer systems, and more particularly to an improved method and system for arranging lexical data for searching.
BACKGROUND OF THE INVENTION
A trie is a data structure that is useful for compressing lexical data such as dictionary words. Tries are composed of states, with a top-level state representing, for example, each of the first letters (e.g., a-z) of all valid words in a given dictionary. Each state is comprised of nodes, wherein each node represents a valid letter in that state, along with some information about that letter, such as a pointer to a lower state (if any). Each state represents a transition from one character in a word to the next. For example, the letter “q ” in one state usually transitions to the letter “u” in a next lower state.
To use the trie, such as to find if a user-input word is a valid word in the dictionary, a search through the states is performed. For example, to find the word “the,” the top level state in the trie is searched until the “t” node is found, and then a next lower level state pointed to by the “t” node is searched to determine if there is an “h” node therein. If not, the word “the” would not be a valid word in that dictionary. However, if there is an “h” node in the state pointed to by the “t” node, the “h” node is examined to find a next state, if any. The state pointed to by the “h” node is then searched to find out whether there is an “e” node therein. If there is an “e” node, to be a valid word, the “e” node needs to be followed by some indication (e.g., a flag) indicating that a valid word exists at this time, regardless of whether the “e” node points to a further state. In a trie-structured dictionary that properly represents the English language, “the” would be a valid word, and thus the top-level state would have a “t” node, the next state pointed to by the “t” node would have an “h” node therein, and the state pointed to by that “h” node would have an “e” node therein with a valid flag set. If characters such as “thj” were searched, however, the “t” node would transition to the next state which would have an “h” node therein, but the next state pointed to by “h” node would not include a “j” node, and thus this word would not be a valid word.
In Western scripts such as English, the top level state is generally from fifty to eighty nodes in length, while states other than the top level state are usually two to ten nodes in length. To search the top-level state, the nodes initially have been arranged alphabetically, and linearly searched from left to right. To speed the search, it is known to reorder the nodes in a states based on lexical frequency, i.e., in the top-level state, the “s” node comes before a “k” node since more words begin with the “s” character than the “k” character. This speeds the search because on the average, less nodes need to be visited before a match is found.
However, in Eastern scripts such as Chinese or Japanese, the top level state can be over ten-thousand nodes in length. Regardless of how ordered, the average linear search through so many nodes takes too long to be used in ordinary applications on ordinary computers. Nevertheless, many existing algorithms that operate on tries expect the states to be linear. If the nodes in a state are arranged in some other manner, the existing linear-based algorithms fail. In short, there has heretofore not been an adequate way in which to enumerate a trie for faster searching while preserving the characteristics of linear states for existing linear search algorithms.
SUMMARY OF THE INVENTION
Briefly, the present invention provides an improved trie enumeration method, system and data structure, along with a method of searching same that operates with binary-like search speeds on states of nodes arranged linearly. The enumeration method for a given state selects a node within that state as a selected node, and converts the selected node to a skip node by moving it forward in the state relative to its original position therein. A pointer is set in the skip node to point to a further node in the state (i.e., a jump node), and a skip flag is set to identify the selected node as a skip node. Ordinarily, the initial skip node is selected from the middle of an alphabetically ordered state, placed at the beginning of the state, and provided with a pointer to its previous position, i.e., the jump node. The node directly before the jump node is set to be a soft terminal node.
As a result of the alphabetic ordering, the segment of nodes before the jump node are alphabetically before the skip node, while the segment of nodes after the jump node are alphabetically after the skip node. The segments of the state may be recursively split into further segments via further skip nodes, jump nodes and soft terminal nodes in each segment. A threshold number of nodes per segment may be used to determine whether to further split a segment. Once the segments are split as desired, (e.g., such that no segment has a number of nodes that exceeds the threshold), the nodes within the segment may be sorted, such as by lexical frequency.
To search the skip encoded state, the search starts at the beginning of the state. If a skip node is encountered that does not equal the search character, the search skips to the jump node if the search character is greater than the skip node, or to the next node (linearly to the right) if the search character is less than the skip node. The search of the nodes continues, essentially as a binary search when skip nodes are encountered, and a linear search otherwise, until a node is found that matches the search character or a soft or regular terminal node is reached.
Other advantages will become apparent from the following detailed description when taken in conjunction with the drawings, in which:

REFERENCES:
patent: 4730348 (1988-03-01), MacCrisken
patent: 4814746 (1989-03-01), Miller et al.
patent: 4853696 (1989-08-01), Mukherjee
patent: 4959785 (1990-09-01), Yamamoto et al.
patent: 5283894 (1994-02-01), Deran
patent: 5475588 (1995-12-01), Schabes et al.
patent: 5488717 (1996-01-01), Gibson et al.
patent: 5488719 (1996-01-01), Kaplan et al.
patent: 5532694 (1996-07-01), Mayers et al.
patent: 5592667 (1997-01-01), Bugajski
patent: 5625554 (1997-04-01), Cutting et al.
patent: 5651099 (1997-07-01), Konsella
patent: 5721899 (1998-02-01), Namba
patent: 5768423 (1998-06-01), Aref et al.
patent: 5778371 (1998-07-01), Fujihara
patent: 5781772 (1998-07-01), Wilkinson, III et al.
patent: 5799299 (1998-08-01), Fujiwara
patent: 5832428 (1998-11-01), Chow et al.
patent: 5951623 (1999-09-01), Reynar et al.
patent: 5966709 (1999-10-01), Zhang et al.
patent: 6009392 (1999-12-01), Kanevsky et al.
patent: 6131102 (2000-10-01), Potter
patent: 6182039 (2001-01-01), Rigazio et al.
patent: 6236959 (2001-05-01), Weise
patent: 41-0301596 A (1998-11-01), None
Knuth, Donald Ervin, “Sorting and Searching,”The Art of Computer Programming 2ndEdition, vol. 3, Addison-Wesley, pp. 492-496, 500-502, 507-509, 512, 576, 722 (1998).

Affiliated with

Bennett John R.

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Hullender Gregory N.

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Karlov Donald D.

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Also associated with

Black Thomas

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

Michalik & Wylie PLLC

Law Firm

[ 0.00 ] – not rated yet Voters 0 Comments 0

Microsoft Corporation

Corporate Assignee

[ 0.00 ] – not rated yet Voters 0 Comments 0

Wang Mary

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and system for improved enumeration of tries does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and system for improved enumeration of tries, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and system for improved enumeration of tries will most certainly appreciate the feedback.

Rate now

Comments { 0 }

Profile ID: LFUS-PAI-O-2588219

All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.

Canada

Charities
Companies
MP Candidates
Patents
Employee Salary Disclosure

World

Places of the World
Scientific Papers

United States

Banks
Companies
Counties
Patents
Employee Salary Disclosure