Multi-layered semiotic mechanism for answering natural...

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000

Reexamination Certificate

active

06584470

ABSTRACT:

FIELD OF THE INVENTION
This invention relates generally to computer software and, more specifically, to a system and method for information extraction.
BACKGROUND
Personal computers or workstations may be linked in a computer network to facilitate the sharing of data, applications, files, and other resources. One common type of computer network is a client/server network, where some computers act as servers and others as clients. In a client/server network, the sharing of resources is accomplished through the use of one or more servers. Each server includes a processing unit that is dedicated to managing centralized resources and to sharing these resources with other servers and/or various personal computers and workstations, which are known as the “clients” of the server.
Computers often need to retrieve information requested by a user. The information may be available locally or may be available on another computer, such as a server, through a network. Retrieving information is relatively simple when the user wishes to retrieve specific information which the user knows to exist and when the user knows relevant parameters about the information to be retrieved such as a document name, an author, or a directory name. However, when the user wishes to retrieve information and has no knowledge of where it might be located or in what document it might be contained, more sophisticated information retrieval (“IR”) techniques are necessary.
IR systems use a search query, input by a user, to locate information which satisfies the query and then return the information to the user. Simple IR systems may use the original query, while more advanced systems may modify the query by adding parameters or changing its format. IR systems may be limited to searching a specific database accessible to the system or they may be enabled to search any available information, such as that located on the Internet. Successfully searching unstructured information such as that available on the Internet generally demands a more flexible IR system, since users have no knowledge of how the information for which they are looking might be indexed and stored.
However, flexible IR systems are difficult to develop. Part of this difficulty stems from the inherent complexity of natural languages, which operate on several different levels of meaning simultaneously. Five of the levels of meaning are the morphological, syntactic, semantic, discourse, and pragmatic levels.
The morphological level focuses on a morpheme, which is the smallest piece of a word that has meaning. Morphemes include word stems, prefixes and suffixes. For example, “child” is the word stem for “childish” and “childlike.”
The syntactic level focuses on the structure of a sentence and the role each word plays in the structure. This level includes the relationship that each word has to the other words in the sentence. For example, the position of a word in a sentence can give valuable insight as to whether the word is the subject of the sentence or an action.
The semantic level focuses not only on the dictionary meaning of each individual word, but also on the more subtle meaning that is derived from the context of the sentence. For instance, the meaning of the word “draw” can change depending on the context in which it is used. To “draw a picture” and to “draw a sword” both use the action “draw,” but in very different ways which are made clear by examining the context provided by the related words.
The discourse level examines a document's structure as a whole and derives further meaning from that structure. For example, technical documents usually begin with an abstract, while newspaper articles generally contain important “who, what, where, when” information in the first paragraph. This structure helps identify the type of document being examined, which in turn aids in determining where certain information in the document might be located and how the information might be organized.
The pragmatic level focuses on a body of knowledge that exists outside the document itself but is not actually reflected in the document. For instance, attempting to discover the current status of the European Currency Unit in different countries assumes a knowledge as to what countries in Europe are taking part in the implementation process, even if those countries are not specifically named in a document.
The levels of meaning operate simultaneously to provide the natural language environment in which communication occurs. Attempts at implementing the different levels of meaning for IR purposes have resulted in three basic types of systems, which may be generally categorized as boolean, statistical/probabilistic, and natural language processing (“NLP”). Many IR systems use a combination of these three basic types.
Boolean systems use basic boolean operators such as “AND” and “OR,” which are implemented mathematically to obtain search results. An example of this is a boolean search for “information AND retrieval,” which will return documents which contain both “information” and “retrieval.” Documents which do not contain both words are ignored by the system. In contrast, a search for “information OR retrieval” will return documents which contain either or both of the words “information” and “retrieval,” and so is a less restrictive search than one utilizing the “AND” operator.
Statistical/probabilistic systems use statistical and probabilistic analysis to aid a user in a search by first returning results that seem to be a better answer to the query. “Better” may mean that the words in the query occur more frequently, are closer together, or match some other criterion that the system classifies as superior.
NLP systems attempt to treat a natural language query as a complete question and use the words, sentence structure, etc., to locate and retrieve suitable documents. However, the different levels of meaning in natural languages discussed previously make NLP systems extremely difficult to design and implement.
Current IR systems, which are generally a combination of the three systems described above, have yet to successfully overcome many of the obstacles presented by natural language queries. For example, natural language information retrieval should deal not only with synonyms in a single language, but also across regions and countries. For example, a “truck” in the United States is often a “lorry” elsewhere. An additional problem is posed by words having multiple meanings, which often require interpretation through context. For instance, the word “charge” may refer to a military charge, an electrical charge, a credit card debit, or many other actions, each one of which should be known to the IR system.
The inability to specify important but vague concepts presents a further problem to IR systems. For example, formulating a question to identify the likelihood of political instability in a country necessarily involves abstract ideas. False drops are yet another problem in current IR systems. False drops are documents which match the query but are actually irrelevant. An example of this is a simple query for “Japan AND currency,” which is intended to find articles on the topic of Japan's currency. However, a document which discusses Japan's housing problems in the first paragraph and the current currency situation in Canada in the third paragraph may be returned because it contains the requested terms.
Indexing inconsistencies also present problems for IR systems. Unless documents are indexed using the same consistent standards, document categories and organization tend to become blurred. A further difficulty to be overcome by IR systems is presented by spelling variations and errors. As with synonyms, spelling variations often occur when dealing with an international audience. Common variations such as “grey”/“gray” or “theater”/“theatre” should be identified by an IR system. In addition, misspellings might cause an IR system to miss a highly relevant document because it fails to recognize misspelled words.
Therefore, what is needed is an information e

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Multi-layered semiotic mechanism for answering natural... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Multi-layered semiotic mechanism for answering natural..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Multi-layered semiotic mechanism for answering natural... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3130741

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.