Related sentence retrieval system having a plurality of...

Data processing: speech signal processing – linguistics – language – Linguistics – Multilingual or national language support

Reexamination Certificate

Rate now

[ 0.00 ] – not rated yet Voters 0 Comments 0

Details Related sentence retrieval system having a plurality of... Related sentence retrieval system having a plurality of...

: 1999-12-07
: 2001-11-20
: Thomas, Joseph (Department: 2644)
: Data processing: speech signal processing, linguistics, language
: Linguistics
: Multilingual or national language support

: C704S007000, C707S793000, C707S793000, C707S793000
: Reexamination Certificate
: active
: 06321191
: ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to a system for retrieving related sentences between sentences written in one language and sentences written in another language, and more particularly a system using sentences written in still another language as intermediate language sentences.
2. Description of the Related Art
With the improvement of the performance of computers, development of electronic dictionaries and progress of technology in natural language processing among other things, many machine translation techniques have been proposed so far.
However, it is still difficult to affirm that a machine translation system with a translation capability of sufficient accuracy has been implemented.
According to the related art, there has been proposed a technique by which a large number of pair data pieces in an original language (first language) sentence and a translated language (second language) are prepared; similar sentences to a first language input sentence are retrieved from the first language sentences in the pair data; second language sentences corresponding to the first language sentences as the result of this retrieval are then outputted from the pair data; and these outputted second language sentences are referred to the user, thereby to improve the quality of translation from the input sentence in the first language sentence.
As methods for obtaining sentences similar to the first language input sentence from the set of first language sentences in the pair data, there have been proposed a method to determine a sentence of high similarity based on the number of words commonly included in the input sentence and sentences to be retrieved; and, as disclosed by Japanese Published Unexamined Patent Application No. 9-50435 (1997), a method to determine a first language sentence having a vector close to the vector corresponding to the first language input sentence as the sentence of high similarity based on the vector space model, one of the similar document retrieving methods.
To obtain the same effect as the foregoing, a method to improve the translation quality is under study, by which each word in a query written in a first language is mechanically converted into a word or a phrase of a second language by using a dictionary, then a corresponding sentence(s) is retrieved from a set of second language sentences by utilizing the set of converted words and/or phrases of the second language, and the obtained second language sentence(s) is referred to the user.
However, the methods according to the related art, by which a similar first language sentence(s) is obtained merely on the basis of the words contained in the first language input sentence, involves the disadvantage that a second language sentence adequate as a translation of the first language input sentence, even if present in the set of pair data pieces, cannot be obtained as a result of the retrieval if the expression of the corresponding first language sentence in the sentence pair differs from that of the first language input sentence. Thus the methods can be effective only if the set of pair data contains a sentence composed of a set of words which are substantially the same as those contained in the first language input sentence.
Such a disadvantage becomes more noticeable as the number of the words contained in the input sentence becomes smaller, and consequently, in a case where a document including a large number of sentences is inputted, non-zero elements of the corresponding document vector significantly increase (substantial dimensions of the vector are raised), and accordingly the reliability of the retrieval result is enhanced, but in most cases actual pair data mostly consists of short sentences, and therefore it is practically difficult to obtain adequate related sentences (second language sentences) to refer to.
Furthermore, whereas a second language sentence(s) to refer to is obtained according to the related art by replacing individual words in the input sentence in the first language with words and/or phrases in the second language by using a dictionary, words and phrases in the second language available for expressing a given word in the first language are extremely diverse and, moreover, the adequate choice of a second language word out of those many alternatives depends on the meaning of the first language input sentence, making it virtually impossible to determine the choice in advance. Therefore, it is difficult to express in advance the relationship of correspondence between first language words and second language words in a comprehensive dictionary form, and it is difficult to obtain an adequate related sentence to refer to.
In view of this problem, the present applicant has already proposed a cross-lingual retrieval system capable of retrieving, on the basis of a query in a first language, a second language sentence(s) which is a more adequate related sentence(s) by using pair data (Japanese Unexamined Patent Application No. Hei 10-202788 [1998]).
This cross-lingual retrieval system stores in a paired sentence storing unit plural pairs each of a sentence in a first language and a corresponding translated sentence in a second language; when a query written in the first language is received from a query receiving unit, a first retrieval unit performs retrieval processing on a set of sentences in the first language sentences stored in the paired sentence storing unit according to the query. Then a second retrieval unit performs retrieval processing on a set of translated sentences written in the second language sentences stored in the paired sentence storing unit to find sentences similar to translated sentences written in the second language correspondingly to the sentences in the first language retrieved by the first retrieval unit.
In other words, retrieval based on the first language sentence is performed on the pair data and, using the second language sentences corresponding to the result of this retrieval, the retrieval of similar second language sentences is performed on the pair data. The successive double retrieval in the first and second languages using the pair data as a bridge makes it possible to retrieve second language sentences which are adequate translation of the query written in the first language without being greatly affected by any difference in expression or the number of words or phrases contained, moreover even if the input sentence in the first language is relatively short.
SUMMARY OF THE INVENTION
Although the cross-lingual retrieval system referred to above can achieve its intended effect, implementation of a similar effect is called for between sentences in more diverse languages in today's society where accelerated internationalization is resulting in everyday use of a wide variety of languages.
The present invention implements a related sentence retrieval system capable of providing users with adequate related sentences to refer to between sentences of many different languages as well by applying the aforementioned cross-lingual retrieval system.
A related sentence retrieval system pertaining to the present invention is provided with a former stage cross-lingual retrieval system for retrieving, on the basis of a query written in a first language, related sentences written in a second language and a latter stage cross-lingual retrieval system for retrieving, on the basis of related sentences in a second language outputted from the former stage cross-lingual retrieval system, related sentences written in a third language, and retrieves, via the second language, sentences written in the third language related to the query written in the first language.
Thus, the former stage cross-lingual retrieval system for retrieving on the basis of a first language sentence a related second language sentence and the latter stage cross-lingual retrieval system for retrieving from the second language a related third language sentence are connected in series.
More specifically, in the former stage cross-lingual retrieval

Affiliated with

Kurahashi Masayuki

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Also associated with

Fuji 'Xerox Co., Ltd.

Corporate Assignee

[ 0.00 ] – not rated yet Voters 0 Comments 0

Oliff & Berridg,e PLC

Law Firm

[ 0.00 ] – not rated yet Voters 0 Comments 0

Thomas Joseph

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Related sentence retrieval system having a plurality of... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Related sentence retrieval system having a plurality of..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Related sentence retrieval system having a plurality of... will most certainly appreciate the feedback.

Rate now

Comments { 0 }

Profile ID: LFUS-PAI-O-2571183

All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.

Canada

Charities
Companies
MP Candidates
Patents
Employee Salary Disclosure

World

Places of the World
Scientific Papers

United States

Banks
Companies
Counties
Patents
Employee Salary Disclosure