Pseudo-anchor text extraction

Data processing: database and file management or data structures – Database and file access – Query optimization

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Reexamination Certificate

active

08073838

ABSTRACT:
A search method uses pseudo-anchor text associated with search objects to improve search performance. The pseudo-anchor text may be extracted in combination with an identifier of the search objects (such as a pseudo-URL) from a digital corpus such as a collection of documents. Pseudo-anchor texts for each object are preferably extracted from candidate anchor blocks using a machine learning based approach. The pseudo-anchor texts are made available for searching and used to help rank the objects in a search result to improve search performance. The method may be used in vertical search of objects such as published articles, products and images that lack explicit URLs and anchor text information.

REFERENCES:
patent: 5920859 (1999-07-01), Li
patent: 6442696 (2002-08-01), Wray et al.
patent: 6636848 (2003-10-01), Aridor et al.
patent: 6925495 (2005-08-01), Hegde et al.
patent: 2002/0169770 (2002-11-01), Kim et al.
patent: 2005/0149576 (2005-07-01), Marmaros et al.
patent: 2005/0149851 (2005-07-01), Mittal
patent: 2005/0165781 (2005-07-01), Kraft et al.
patent: 2006/0026496 (2006-02-01), Joshi et al.
patent: 2006/0074871 (2006-04-01), Meyerzon et al.
patent: 2006/0074903 (2006-04-01), Meyerzon et al.
patent: 2006/0136098 (2006-06-01), Chitrapura et al.
patent: 2006/0143254 (2006-06-01), Chen et al.
patent: PCT/EP05/050321 (2005-01-01), None
Amitay, “Using Common Hypertext Links to Identify the Best Phrasal Description of Target Web Documents”, available at least as eary as Jan. 24, 2007, at <<http://einat.webir.org/sigir—98.pdf>>, pp. 1-5.
Attardi, et al., “Theseus: Categorization by Context,” Proceedings of the 8th International World Wide Web Conference, 1999, pp. 1-2.
Bikel, et al., “Nymble: A High-Performance Learning Name-Finder,” Proceedings of ANLP, 1997, pp. 194-201.
Broder, et al., “Syntactic Clustering of the Web”, retrieved at <<http://www.research.digital.com/SRC>>, SRC Technical Note, Jul. 25, 1997, Digital Equipment Corporation, 1997, pp. 1-13.
Califf, et al., “Relational Learning of Pattern-Match Rules for Information Extraction,” CoNLL97: Computational Natural Language Learning, ACL, 1997, pp. 9-15.
Chakrabarti, et al., “Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text,” Proceedings of the 7th International World Wide Web Conference, 1998, pp. 13.
Chang, et al., “A Chinese-to-Chinese Statistical Machine Translation Model for Mining Synonymous Simplified-Traditional Chinese Terms”, National Chi-Nan University, 1994, pp. 242-247.
“CiteSeer.IST Scientific Literature Digital Library”, available as early as Feb. 26, 2007, retrieved on Apr. 4, 2007, at <<http://citeseer.ist.psu.edu>>, 1 pg.
Collins, et al.,“Unsupervised Models for Named Entity Classification,” Proceedings of the Joint SIGDAT Conference on Empiracal Methods in Natural Language Processing, 1999, pp. 100-110.
Davison, “Topical Locality in the Web”, ACM, Proceedings of SIGIR, 2000, pp. 272-279.
Freitag, “Information Extraction from HTML: Application of a General Machine Learning Approach,” Proceedings of the 15th Conference on Artificial Intelligence, 1998, 7 pgs.
Giles, et al., “CiteSeer: An Automatic Citation Indexing System”, ACM, Proceedings of the 3rd ACM Conference on Digital Libraries (DL'98), 1998, pp. 89-98.
Haveliwala, et al., “Evaluating Strategies for Similarity Search on the Web”, ACM, WWW2002, May 7-11, 2002, pp. 1-10.
Lawrence, et al., “Digital Libraries and Autonomous Citation Indexing”, IEEE, 1999, pp. 67-71, vol. 32, No. 6.
Levenshtein, “Binary Codes Capable of Correcting Deletions, Insertions and Reversals,” Soviet Physics, Doklady, 1966, pp. 707-710.
Lu, et al., “A Transitive Model for Extracting Translation Equivalents of Web Queries through Anchor Text Mining”, available at least as eary as Jan. 24, 2007, at <<http://delivery.acm.org/10.1145/1080000/1072236/p8-lu.pdf?key1=1072236&key2=0538359611&coll=GUIDE&dl=GUIDE&CFID=12180585&CFTOKEN=46372023>>, pp. 1-7.
Lu et al. “Anchor Text Mining for Translation of Web Queries: A Transitive Translation Approach”, ACM Transactions on Information System, vol. 22, No. 2, Apr. 2004, pp. 242-269.
McBryan, “GENVL and WWWW: Tools for Taming the Web”, First International Conference on the World Wide Web, CERN, May 1994, pp. 1-12, Geneva, Switzerland.
Muslea, “Extraction Patterns for Information Extraction Tasks: A Survey”, American Association for Artificial Intelligence, 1999, 6 pgs.
Nie et al, “Extracting Objects from the Web,” ICDE, 2006, pp. 1-3.
Nie, et al., “Object-Level Ranking: Bringing Order to Web Objects,” ACM, WWW2005, May 10-14, 2005, pp. 567-574.
Shi, et al., “Pseudo-Anchor Text Extraction for Vertical Search”, Microsoft Technique Report, MSR-TR-2006-122, Aug. 2006, 6 pgs.
Yin et al., “Towards Understanding the Functions of Web Element”, Springer-Verlag, 2004 AIRS, 2005, pp. 313-324.
Yu et al, “Improving Pseudo-Relevance Feedback in Web Information Retrieval Using Web Page Segmentation,” ACM, WWW2003, May 20-24, 2003, 8 pgs, Budapest, Hungary.
Zhu et al, “Simultaneous Record Detection and Attribute Labeling in Web Data Extraction,” ACM, KDD'06, Aug. 20-23, 2006, 10 pgs.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Pseudo-anchor text extraction does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Pseudo-anchor text extraction, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Pseudo-anchor text extraction will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-4303093

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.