Creation of structured data from plain text

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C707S793000, C704S009000

Reexamination Certificate

active

06714939

ABSTRACT:

A portion of the disclosure of this patent document contains material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
REFERENCE TO A COMPUTER PROGRAM LISTING APPENDIX
A computer program listing appendix is included in the attached CD-R created on Dec. 12, 2000, labeled “Creation of Structured Data from Plain Text,” and including the following files: CommodityProperty.nml (13 KB), DefaultSeg14Result.xml, (2 KB), ElectricalProperty.nml (16 KB), Example.txt, Grammar.txt, INML.xml, (5 KB), MeasurementProperty.nml (22 KB), Output.txt, (3 KB), PeriodProperty.nml (6 KB), PhysicalProperty.nml (36 KB), ReservedNameProperty.nml (6 KB), Seg14.nml (30 KB), Seg14Phrasing.nml (71 KB), UsageProperty.nml (7 KB), and Utility.nml (6 KB). These files are incorporated by reference herein.
BACKGROUND
A. Technical Field
The present invention relates to creation of structured data from plain text, and more particularly, to creation of structured data from plain text based on attributes or parameters of a web-site's content or products.
B. Background of the Invention
In recent years, the Internet has grown at an explosive pace. More and more information, goods, and services are being offered over the Internet. This increase in the data available over the Internet has made it increasingly important that users be able to search through vast amounts of material to find information that is relevant to their interests and queries.
The search problem can be described at least two levels: searching across multiple web-sites; and searching within a given site. The first level of search is often addressed by “search engines” such as Google™ or Alta Vista™ or directories such as Yahoo™. The second level, which is specific to the content of a site, is typically handled by combinations of search engines and databases. This approach has not been entirely successful in providing users with efficient access to a site's content.
The problem in searching a website or other information-technology based service is composed of two subproblems: first, indexing or categorizing the corpora (body of material) to be searched (i.e., content synthesis), and second, interpreting a search request and executing it over the corpora (i.e., content retrieval). In general, the corpora to be searched typically consist of unstructured information (text descriptions) of items. For e-commerce web-sites, the corpora may be the catalog of the items available through that web-site. For example, the catalog entry for a description might well be the sentence “aqua cashmere v-neck, available in small, medium, large, and extra large.” Such an entry cannot be retrieved by item type or attribute, since the facts that v-neck is a style or sweater, cashmere a form of wool, and aqua a shade of blue, are unknown to current catalogs or search engines. In order to retrieve the information that this item is available, by item type and/or attribute, this description must be converted into an attributed, categorized description. In this example, such an attributed, categorized description may include properly categorizing the item as a sweater, extracting the various attributes, and tagging their values. An example of such a description is illustrated in Table 1.
TABLE 1
Item
Style
Color
Material
Sizes
Sweater
v-neck
Aqua
Cashmere
S, M, L, XL
Current technology permits such representations in databases. Further, for many standard items, numeric codes are assigned to make the job of search and representation easier. One such code is the UN Standard Products and Services Code (UN/SPSC), which assigns a standard 8-digit code to any human product or service.
However, while the taxonomies and the technology to represent the taxonomies may exist, conventional systems are unable to generate the taxonomic and attributed representation for an object from its textual description. This leads to the first of the two problems outlined above: the content synthesis problem. More specifically, that is the problem of how to convert plain text into structured objects suitable for automated search and other computational services.
The second problem is one of retrieving data successfully; once the data has been created and attributed, it must be accessible. E-commerce and parametric content sites are faced with a unique challenge, since they must offer search solutions that expose only those products, contents or services that exactly match a customer's specifications. Today, more than 50% of visitors use search as their preferred method for finding desired goods and services. However, e-commerce web sites continue to offer their customers unmatched variety, category-based navigation of e-commerce sites (“virtual aisles”), which have become increasingly complex and inadequate. In particular, many web-sites that offer a large catalog of products are often unable to find products with precise or highly parameterized specifications, and instead require the user to review dozens of products that potentially match these specifications.
A few statistics help to emphasize the importance of good searching ability. An important metric that measures the conversion rate of visitors to e-commerce sites into buyers is the book-to-look ratio. The industry average is that only 27 visitors in a 1000 make a purchase. The biggest contributor to this abysmal ratio is failed search. Forrester Research reports that 92% of all e-commerce searches fail. Major sites report that 80% of customers leave the site after a single failed search. Therefore, improving the search capability on a site directly increases revenue through increased customer acquisition, retention, and sales.
While all web-sites experience some form of these search problems to some extent, the T problem is particularly acute for web-sites with a deep and rich variety of content or products. Examples are electronic procurement networks, financial sites, sporting goods stores, grocery sites, clothing sites, electronics, software, and computer sites, among many others. Another class of sites with a deep search problem comprises of those carrying highly configurable products such as travel and automotive sites. Ironically, as a rule of thumb, the more a web-site has to offer, the greater the risk that customers will leave the site because of a failed search.
When a customer physically enters a large department store, she can ask a clerk where she can find what she is looking for. The clerk's “search” is flexible in that he can understand the customer's question almost no matter how it is worded. Moreover, the clerk's “search” is generally accurate since the clerk can often specifically identify a product, or initial set of products, that the customer needs. Searches on web sites need to be equally flexible and accurate. In order for that to happen, a visitor's request must be understood not only in terms of the products, but also in terms of the request's parameters or characteristics. However, conventional information retrieval systems for web-site content have been unable to achieve this.
Some of the conventionally used methods used to find goods and services on web sites, and some problems with these conventional methods are outlined below:
1. Keyword-based search: In this method, users type a set of words or phrases describing what they want to a text box, typically on the main page of the site. A program on the it site then takes each individual word entered (sometimes discarding “noise” words such as prepositions and conjunctions), and searches through all pages and product descriptions to find items containing either any combination of the words. This method, when given an English sentence or phrase, either returns far too many results or too few. For example, if a customer requests, “show me men's blue wool sweaters,” the search could b

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Creation of structured data from plain text does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Creation of structured data from plain text, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Creation of structured data from plain text will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3282763

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.