Natural language processing apparatus and method for...

Data processing: speech signal processing – linguistics – language – Linguistics – Natural language

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Reexamination Certificate

active

06188977

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to a natural language processing apparatus and a method for analyzing a natural language sentence using a dictionary and grammar data.
2. Related Background Art
For an analysis of a document written in a natural language, the definitions and properties of the individual words in a sentence are specified by referring to a dictionary. However, in a natural language sentence, various words appear for which a dictionary can provide no specific descriptions. For example, since new models of a product are produced sequentially, the names of all those models can not be specifically registered in a dictionary.
Further, with reference to onomatopoeic or mimetic words, the “z”, as in “zzzzz”, that can be used in English to represent the breathing sound produced by a sleeper, can be repeated an arbitrary number of times, so that in addition to “zzzzz”, we could also have “zzzzzzzzzz”. And thus, since for such an expression an infinite number of descriptive variations can be produced, the registration in a dictionary of all the notations for the expression is neither feasible nor possible.
Furthermore, in a natural language sentence numerical expressions such as the following may appear: “1093”, “5,000,000”, “7.5”, “½”, “10-20”, “2, 3”, “5−3=2”, “1997. 06. 25”, “Jun. 25, 1997”, “10:31”, “03-3123-4567”, and “2:3”.
In consonance with the forms used for the expressions, these numerals represent the following: “integers”, a “decimal fraction”, a “fraction”, “round numbers”, an “equation”, “dates”, a “time”, a “number”, and a “ratio”.
For example, “5,000,000” represents an integer while “Jun. 26, 1997” represents a date, and when analyzing a natural language, numerical expressions such as these must be extracted from sentences and the meanings ascribed to them must be adequately identified.
Assume, for instance, that in a sentence which is to be analyzed for voice synthesization the expression “Jun. 25, 1997” appears. If this entry were merely to comprise numerals and symbols that were to be sequentially read, the resultant pronunciation product would correspond to the string of words “six slash two five slash nine seven”. However, were this numerical expression to be identified as an entry that represented a “date”, it would correctly be read as “Jun. twenty-fifth, ninety-seven”.
Consider as another example the information extraction technique. According to this technique, elements describing who, when, what, where and how are extracted from a sentence and are expressed in the form of a table. The focus of this technique is the provision of a means by which a user can be protected from being inundated by a flood of information produced by recent computer networking developments. If, as part of the pre-processing provided for information extraction, numerals can be correctly identified during the analyzation of a natural language sentence, a date, important as information that is used to establish the “when” of an occurrence, can be correctly extracted.
For many of the above words a specific rule is used for the construction of the expressions in which they are employed. Thus, assuming that the product models are NL550, NL560, . . . , it can be ascertained that the models of the products in this series are named using the pattern “NL<number>”.
Further, numerical expressions are not formed merely by arranging numerals and symbols, and there are rules that govern the interpretation of the contents of expressions. In a fraction, for example, normally two numbers are juxtaposed with an intervening “/” symbol, since ordinarily not more than two strings of numerals are used with an intervening “/”, and in addition, in a fraction a number before or after a “/” usually does not begin with a “0”. However, in a date expression that employs the same “/” symbol, three numbers may be included, as in “Nov. 05, 1997”, and a number that is set off by a “/” may begin with a “0”.
Furthermore, the rules governing numerical expressions depend not only on the order of the numbers and symbols, but also on the relationships of the quantities represented by the numbers. For example, when expressing round numbers, such as “2, 3”, the quantity that is represented by the numeral preceding the “,” must be smaller by “1” than the quantity that is represented by the succeeding numeral.
In order to correctly analyze words for which, in consonance with specific rules, an infinite number of descriptive variations can be produced, ideally the rules that are used should themselves be adequately described; but since in actuality complete descriptions are not available for all such rules, a system is required that can provide for the flexible addition, deletion, or correction of rules.
SUMMARY OF THE INVENTION
It is, therefore, one objective of the present invention to provide a natural language processing apparatus and method for flexibly describing rules governing the use of such expressions as numerals, onomatopoeic words or the names of models, and for, in consonance with such rules, extracting the referenced expressions from sentences and correctly identifying the meanings that are represented by the expressions.
It is another objective of the present invention to provide a natural language processing apparatus an d method that can produce correct vocal reproductions of sentences which include such expressions as numerals, onomatopoeic words and serial numbers.
According to one aspect, the present invention, which achieves these objectives, is related to a natural language processing apparatus comprising:
grammar description data storage means for storing word notation grammar description data that describe construction rules for character strings composed of words belonging to a specific category; and
analysis means for, based on the word notation grammar description data, extracting as a word, from a natural language sentence that is input, a character string that satisfies the construction rules, and for analyzing the natural language sentence.
According to another aspect, the present invention, which achieves these objectives, relates to a natural language processing method comprising the steps of:
entering a natural language sentence;
extracting as a word from the natural language sentence, based on word notation grammar description data that describe character string construction rules for words that belong to a specific category, a character string that satisfies the construction rules, and analyzing the natural language sentence; and
outputting the result of the analysis.
According to still another aspect, the present invention that achieves these objectives relates to a computer-readable storage medium on which is stored a program for controlling a computer that performs natural language processing, the program comprising codes for causing the computer to perform the steps of:
entering a natural language sentence;
extracting as a word from the natural language sentence, based on word notation grammar description data that describe construction rules for words that belong to a specific category, a character string that satisfies the construction rules, and analyzing the natural language sentence; and
outputting the result of the analysis.
During the course of the following description of a preferred embodiment of the invention, other objectives and advantages, in addition to those discussed above, will become apparent to those skilled in the art. In the description, reference is made to accompanying drawings that form a part of the description and that illustrate an example of the invention.


REFERENCES:
patent: 4829423 (1989-05-01), Tennant et al.
patent: 4914704 (1990-04-01), Cole et al.
patent: 5225981 (1993-07-01), Yokogawa
patent: 5371807 (1994-12-01), Register et al.
patent: 5781884 (1998-07-01), Pereira et al.
patent: 5848389 (1999-10-01), Asano et al.
patent: 5890117 (1999-03-01), Silverman
patent: 5970449 (1999-10-01), Alleva et al.
patent: 07021174 (1995-01-01), None

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Natural language processing apparatus and method for... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Natural language processing apparatus and method for..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Natural language processing apparatus and method for... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2561964

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.