Reduction of annotations to extract structured web data

Data processing: database and file management or data structures – Database and file access – Preparing data for information retrieval

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S748000, C715S231000

Reexamination Certificate

active

08046360

ABSTRACT:
Document, such as web pages of a domain, are annotated to facilitate extracting structured information from the documents. The documents are clustered. Each cluster is such that the documents within that cluster are similar to each other at least with respect to a first threshold, such as according to a shingling metric, where the first threshold is an 8/8 shingling match. There is at least one overlap cluster, each overlap cluster including at least one of the plurality of clusters such that documents of the at least one cluster included in that overlap cluster are similar to each other at least with respect to a second threshold that is lower than the first threshold. A particular overlap cluster is designated, as is a particular cluster of the particular overlap cluster. For the particular designated cluster, an obtained annotation is transferred to other clusters included in the designated particular overlap cluster.

REFERENCES:
patent: 2006/0112089 (2006-05-01), Broder et al.
patent: 2007/0150802 (2007-06-01), Wan et al.
Bar-Yossef, Z., Broder, A., Kumar, R., Tomkins, A. (2004), “Sic transit gloria telae: towards an understanding of the web's decay”, Proceedings of the 13th International Conference on World Wide Web (WWW), pp. 328-337.
Buttler, D., “A Short Survey of Document Structure Similarity Algorithms”, Mar. 5, 2004, The 5th International Conference on Internet Computing, Las Vegas, NV, United States, Jun. 21, 2004 through Jun. 24, 200, 9 pages.
Broder et al., “Syntactic Clustering of the Web,” SRC Technical Note, 1997-015, Jul. 25, 1997, 13 pages.
U.S. Appl. No. 11/955,129, filed Dec. 12, 2007.
Black, Paul E., “greedy algorithm”, in Dictionary of Algorithms and Data Structures [online], Paul E. Black, ed., U.S. National Institute of Standards and Technology. Feb. 2, 2005. downloaded on Oct. 30, 2007, Available from: http://www.nist.gov/dads/HTML/greedyalgo.html.
Hochbaum et al., “A Best Possible Heuristic for the k-Center Problem”, Mathematics of Operations Research, vol. 10, No. 2 (May 1985), pp. 180-184.
Kushmerick, Nicholas, “Wrapper induction:Efficiency and expressiveness”, Artificial Intelligence 118 (2000) 15-68.
Kushmerick, Nicholas, “Wrapper induction:Efficiency and expressiveness”, Workshop on AI and Information Integration, AAAI-98, 1998. http://citeseer.ist.psu.edu/article/kushmerick98wrapper.html, downloaded on Nov. 29, 2007, 8 pages.
Tenier et al., “Knowledge extraction from webpages”, http://ftp.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-185/semAnnot05-11.pdf, downloaded on Oct. 29, 2007, 4 pages.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Reduction of annotations to extract structured web data does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Reduction of annotations to extract structured web data, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Reduction of annotations to extract structured web data will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-4283481

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.