Data processing: database and file management or data structures – Database design – Data structure types
Patent
1995-09-29
1998-07-07
Black, Thomas G.
Data processing: database and file management or data structures
Database design
Data structure types
707 2, 707 3, G06F 1730
Patent
active
057783616
ABSTRACT:
A method and system for fast indexing and searching of text in compound-word languages such as Japanese, Chinese, Hebrew, and Arabic. Computer codings of such compound-word languages often contain different character types, e.g. the shift-JIS coding of Japanese represents kanji, katakana, hiragana, and roman characters with different codings in the same character set, to form index terms and search terms. In a preferred embodiment, a content-index search system is invoked in response to a query on a collection of objects. The collection of objects is indexed by the content-index and may, for example, be a corpus of documents indexed by the terms contained in the documents. A content-index search system uses the content-index to generate and store an initial search result in response to the query; a direct search system is used in certain situations. The content-index contains, for each of a plurality of terms, a reference to each object. The content-index is created by first creating a preliminary index term for each plurality of terms delimited by a word separator or a character type transition in a string of characters to be indexed. For each preliminary index term of a first type, e.g. katakana or roman, the preliminary index term is utilized as an index term. For each preliminary index term of a second type, e.g. kanji, the preliminary index term is step-indexed to create a plurality of index terms of a length less than a predetermined step size. The index terms are then added to the content-index in association with the object being indexed. A string of text entered into a search engine as a search term is processed into preliminary search terms and search terms in a similar manner.
REFERENCES:
patent: 4064983 (1977-12-01), Inose et al.
patent: 4602878 (1986-07-01), Merner et al.
patent: 4679951 (1987-07-01), King et al.
patent: 5109352 (1992-04-01), O'dell
patent: 5148541 (1992-09-01), Lee et al.
patent: 5168533 (1992-12-01), Kato et al.
patent: 5187480 (1993-02-01), Thomas et al.
patent: 5276616 (1994-01-01), Kuga et al.
patent: 5329506 (1994-07-01), Kitta et al.
patent: 5331557 (1994-07-01), Liu
patent: 5337233 (1994-08-01), Hofert et al.
patent: 5384700 (1995-01-01), Lim et al.
patent: 5416898 (1995-05-01), Opstad et al.
patent: 5537431 (1996-07-01), Chen et al.
patent: 5542090 (1996-07-01), Henderson et al.
patent: 5544352 (1996-08-01), Egger
patent: 5586198 (1996-12-01), Lakritz
patent: 5590317 (1996-12-01), Iguchi et al.
patent: 5642520 (1997-06-01), Takeshita et al.
Makino, Beta: An Automatic Kana-Kanji Translation system, IEEE, pp. 46-52, Jan. 1985.
Morita, japanese Text Input System, IEEE, pp. 29-35, May 1985.
Chu, Chinese/Kanji Text and Data Processing, IEEE pp. 11-12, Jan. 1985.
Huang, The Input and Output of Chinese and Japanese Characters, IEEE, pp. 18-24, Jan. 1985.
Becker, Typing Chinese, Japanese and Korean, IEEE, pp. 27-34, Jan. 1985.
Matsuda, Processing Information in Japanese, IEEE, pp. 37-45, Jan. 1985.
Jones William
Nanjo Tsutomu
Black Thomas G.
Coby Frantz
Microsoft Corporation
LandOfFree
Method and system for fast indexing and searching of text in com does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and system for fast indexing and searching of text in com, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and system for fast indexing and searching of text in com will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-1217881