System for providing cross-lingual information retrieval

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C707S793000

Reexamination Certificate

active

06381598

ABSTRACT:

FIELD OF THE INVENTION
The present invention is related to the searching of information repositories such as databases, and in particular to a facility for generating cross-lingual queries.
BACKGROUND OF THE INVENTION
Every day more information becomes available electronically over networks. Far from growing linearly, this growth is driven by numerous factors like the increasing accessibility to more media of information, the growing power of computers and networks, and the ever more data-intensive applications we are working with.
This gold mine of data however suffers from a lack of structure and consistency: the Web is unstructured and uncontrolled by nature, whereas structured databases use a widening variety of formats, either standardized or proprietary.
When accessing heterogeneous legacy databases on Intranets or while querying multiple information sources on the Internet, the end-user only wants to have a simple and straightforward point of access.
With classical tools, finding the right information to suit each user's needs is now the problem, for anything but the easiest of searches. The user must master different protocols; different database access methods; different document formats; and then use the information from one search to manually drive another. Thus, there is a need for information retrieval systems and approaches for easily interfacing into multiple information sources.
An exemplary information retrieval architecture is described in the article entitled “System Components For Embedded Information Retrieval From Multiple Disparate Information Sources”, Ramana B. Rao, Daniel M. Russell, and Jock D. Mackinlay, Proceedings of 1993 ACM Symposium on User Interface Software and Technology, Atlanta, Ga, Nov. 1993 ACM SIGGRAPH and SIGCHI. The architecture incorporates an intermediary server which mediates access requests between an information access client (i.e. the user) and various information sources. Thus, the user only needs to interface with the information access client in order to retrieve the information from multiple information sources.
Another characteristic of information on the Web is that it can be in any language. Generally, a query only searches for items that are in the same language as the query. In situations where information found is in a different language, the reason is typically because the information contains a “word” that matches a search term. For example, a search for information on a famous person or event, may results in receiving information/documents in multiple languages.
However, what would be desirable is to obtain documents in different languages. So take for example a topic such as “trees”. It would be desirable to translate the search term trees into the various languages in which documents would occur. A search may then retrieve information in those translated languages.
A dictionary based method for cross-lingual information retrieval is described by Lisa Ballesteros and Bruce Croft, “Dictionary Methods for Cross-Lingual Information Retrieval”, Lecture Notes in Computer Science 1134 ISSN
0302-14 9743
(1996). The paper describes experiments which analyze the factors that affect dictionary based methods for cross-lingual retrieval and present methods that dramatically reduce the errors such an approach usually makes. The paper defines cross-lingual information retrieval as the ability to query in one language but perform retrieval across languages.
SUMMARY OF THE INVENTION
The invention relates to the searching of network accessible distributed databases, such as those found on the Internet. This invention enables a user to generate a query using search terms and expressions in their native language and to specify that the search results may include documents in other languages. With the query, the user indicates the target language in which results will be accepted. The system then processes the query using computational linguistic techniques and verifies the accuracy of the results received with respect to their language and the linguistic structure of the initial search terms. In a multi-word expression all combinations are verified automatically.
1. The method of the invention is comprised of the following steps: Split each multi-word search expression among the search terms into elementary words and suppress stopwords (and, the, etc.);
For each language in which documents will be retrieved:
2. determine for each resulting elementary word the stemmed translations,
2a. translate the elementary word into the target language; and
2b. stem the translated word;
3. search for documents containing one of the resulting combination of stemmed translations;
4. verify for each found document that the stemmed translations appear in the correct linguistic structure so that inappropriate results can be eliminated.


REFERENCES:
patent: 5426583 (1995-06-01), Uribe-Echebarria Diaz De Mendibil
patent: 5450598 (1995-09-01), Kaplan et al.
patent: 5564058 (1996-10-01), Kaplan et al.
patent: 5581780 (1996-12-01), Kaplan et al.
patent: 5594641 (1997-01-01), Kaplan et al.
patent: 5613145 (1997-03-01), Kaplan et al.
patent: 5642522 (1997-06-01), Zaenen et al.
patent: 5805832 (1998-09-01), Brown et al.
patent: 5953726 (1999-09-01), Carter et al.
patent: 6092036 (2000-07-01), Hamann
patent: 0 838 765 (1998-04-01), None
patent: WO 98/48359 (1998-10-01), None
patent: WO 98/48361 (1998-10-01), None
Andreoli, J.M. et al., The Constraint-Based Knowledge Broker Model: Semantics, Implementation and Analysis,J. Symbolic Computation, (1996) 21, pp. 635-667.
Ballesteros, L. et al., Dictionary Methods for Cross-Lingual Information Retrieval,Lecture Notes in Computer Science, 1134, ISSN 0302-9743, pp. 791-801., 1996.
Rao, R. et al., System Components for Embedded Information Retrieval from Multiple Disparate Information Sources,Proceedings of 1993 ACM Symposium on User Interface Software and Technology, Atlanta, GA, Nov. 1993, ACM SIGGRAPH and SIGCHI.
European Search Report and Annex, Application No. EP 99 31 0220.
Hull, D. A., et al.: “Querying Across Languages: A Dictionary-Based Approach to Multilingual Information Retrieval”,Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, US, New York, NY: ACM, vol. CONF. 19, 1996, pp. 49-57, XP000788309 ISBN: 0-89791-792-8 * p. 50, left-hand column, line 26-p. 52, left-hand column, line 6*.
Ballesteros, L., et al. “Phrasal Translation and Query Expansion Techniques for Cross-Language Information Retrieval”,Annual International ACM-SIGIR Conference n Research and Development in Information Retrieval, US, New York, NY: ACM, 1997, pp. 84-91, XP000782005 ISBN: 0-89791-836-3 * p. 84, right-hand column, line 22—p. 85, right-hand column, line 48*.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

System for providing cross-lingual information retrieval does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with System for providing cross-lingual information retrieval, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System for providing cross-lingual information retrieval will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2914238

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.