Data processing: database and file management or data structures – Data warehouse – data mart – online analytical processing ,... – Data extraction – transformation – and loading
Reexamination Certificate
2007-04-06
2010-11-16
Mizrahi, Diane (Department: 2617)
Data processing: database and file management or data structures
Data warehouse, data mart, online analytical processing ,...
Data extraction, transformation, and loading
C707S776000
Reexamination Certificate
active
07836012
ABSTRACT:
Methods and systems for information extraction are disclosed. In one such method and system, a sample of related articles is obtained, and an article is selected as a seed article. The distances between sample articles are calculated to determine a set of one or more closest articles to the seed article. The set of closest articles is used to identify information fields containing variable data within the seed article. There are a variety of techniques by which this may be performed, one of which is by using dynamic programming alignment to compute alignments between articles. The information fields are labeled, and a template is generated using the labeled fields. The template is used to extract data from a source article by comparing the source article with the template and associating the variable data of the source article with the labeled fields.
REFERENCES:
patent: 5982369 (1999-11-01), Sciammarella et al.
patent: 6009442 (1999-12-01), Chen et al.
patent: 6037939 (2000-03-01), Kashiwagi et al.
patent: 6058417 (2000-05-01), Hess et al.
patent: 6237011 (2001-05-01), Ferguson et al.
patent: 6271840 (2001-08-01), Finseth et al.
patent: 6289353 (2001-09-01), Hazlehurst et al.
patent: 6298174 (2001-10-01), Lantrip et al.
patent: 6332135 (2001-12-01), Conklin et al.
patent: 6369840 (2002-04-01), Barnett et al.
patent: 6567980 (2003-05-01), Jain et al.
patent: 6606625 (2003-08-01), Mulsea et al.
patent: 6615184 (2003-09-01), Hicks
patent: 6647383 (2003-11-01), August et al.
patent: 6678681 (2004-01-01), Brin
patent: 6732161 (2004-05-01), Hess et al.
patent: 6785671 (2004-08-01), Bailey et al.
patent: 6853982 (2005-02-01), Smith et al.
patent: 6920609 (2005-07-01), Manber et al.
patent: 7058598 (2006-06-01), Chen et al.
patent: 7076443 (2006-07-01), Emens et al.
patent: 7080070 (2006-07-01), Gavarini
patent: 7092936 (2006-08-01), Alonso et al.
patent: 7103592 (2006-09-01), Huret
patent: 7124129 (2006-10-01), Bowman et al.
patent: 7127416 (2006-10-01), Tenorio
patent: 7149804 (2006-12-01), Chatani
patent: 7653641 (2010-01-01), Theissen et al.
patent: 2001/0056418 (2001-12-01), Youn
patent: 2002/0032612 (2002-03-01), Williams et al.
patent: 2002/0065722 (2002-05-01), Hubbard et al.
patent: 2002/0099622 (2002-07-01), Langhammer
patent: 2002/0161658 (2002-10-01), Sussman
patent: 2002/0174076 (2002-11-01), Bertani
patent: 2003/0028446 (2003-02-01), Akers et al.
patent: 2003/0050865 (2003-03-01), Dutta et al.
patent: 2003/0105680 (2003-06-01), Song et al.
patent: 2003/0167209 (2003-09-01), Hsieh
patent: 2004/0073625 (2004-04-01), Chatani
patent: 2004/0107142 (2004-06-01), Tomita et al.
patent: 2005/0021997 (2005-01-01), Beynon et al.
patent: 2005/0071255 (2005-03-01), Wang et al.
patent: 2005/0075940 (2005-04-01), DeAngelis
patent: 2005/0183041 (2005-08-01), Chiu et al.
patent: 2005/0251535 (2005-11-01), Theissen et al.
patent: 2006/0190252 (2006-08-01), Starkie
patent: 0964341 (1999-12-01), None
patent: WO 01/13273 (2001-02-01), None
patent: WO 01/46870 (2001-06-01), None
Archive of “mySimon: Compare products and prices from around the Web,” www.mysimon.com/index.jhtml, [online] [Archived by http://archive.org on Jun. 3, 2003; Retrieved on Jan. 10, 2007] Retrieved from the InternetURL:http://web.archive.org/web/20030603175323/www.mysimon.com/index.jhtml.
Archive of “mySimon: Frequently Asked Questions,” www.mysimon.com/corporate/index.jhtml?pgid=help, [online] [Archived by http://archive.org on Jun. 4, 2001; Retrieved on Jan. 10, 2007] Retrieved from the Internet<URL:http://web.archive.org/web/20010604082923/www.mysimon.com/corporate/index.jhtml?pgid=help.
Archive of “mySimon: Make mySimon your homepage,” www.mysimon.com/Nikon—Coolpix—5700/4014-650..., [online] [Archived by http://archive.org on Dec. 7, 2003; Retrieved on Sep. 7, 2006] Retrieved from the Internet<URL:http://web.archive.org/web/20031207141726/www.mysimon.com/Nikon—Coolpix—5700/4014-650....
Archive of “mySimon: Merchant Info,” www.mysimon.com/corporate/index.jhtml?pgid=help, [online] [Archived by http://archive.org on Jun. 3, 2003; Retrieved on Jan. 10, 2007] Retrieved from the Internet URL:http://web.archive.org/web/20030603173203/www.mysimon.com/corporate/index.jhtml?pgid=help>.
Archive of “mySimon: Shopping Guides,” www.mysimon.com/index.anml, [online] [Archived by http://archive.org on May 10, 2000; Retrieved on Jan. 10, 2007] Retrieved from the Internet URL: http://web.archive.org/web/20000510222151/www.mysimon.com/index.anml>.
Archive of “mySimon: What is mySimon,” www.mysimon.com/about—mysimon/companymeet..., [online] [Archived by http://archive.org on May 10, 2000; Retrieved on Sep. 12, 2006] Retrieved from the Internet URL: http://web.archive.org/web/20000510054852/www.mysimon.com/about—mysimon/company/meet...>.
BizRate.com web page, as provided by Internet Archive Wayback Machine at http://web.archive.org/web/20030101-20030922re—/http://bizrate.com/, as published between Jan. 1, 2003 and Sep. 22, 2003.
Brin, S. et al., “The Anatomy of a Large-Scale Hypertextual Web Search Engine,” 1998, Computer Science Department, Stanford University, Stanford, CA.
Chang, C-H. et al., “IEPAD: Information Extraction Based on Pattern Discovery,” 2001, Dept. of Computer Science and Information Engineering, National Central University, Chung-Li, Taiwan.
Crescenzi, V. et al., “Road Runner: Towards Automatic Data Extraction from Large Web Sites,” Proceedings of the 27thVLDB Conference, 2001, Rome, Italy.
DealTime.com web page, as provided by Internet Archive Wayback Machine at http://web.archive.org/web/20030101-20030922re—/http://dealtime.com/, as published between Jan. 1, 2003 and Sep. 22, 2003.
Delort, J-Y. et al., “Enhanced Web Document Summarization Using Hyperlinks,” HT'03, Aug. 26-30, 2003, Nottingham, United Kingdom.
Freitag, D. et al., “Boosted Wrapper Induction,” 2000, American Association for Artificial Intelligence.
Hsu, C-N. et al., “Generating Finite-State Transducers for Semi-Structured Data Extraction from the Web,” Information Systems, 1998, pp. 521-538, vol. 23, No. 8, Elsevier Science Ltd. Great Britain.
International Search Report and Written Opinion, PCT/US2004/038559, Mar. 16, 2005.
Kushmerick, N., “Adaptive Information Extraction: Core Technologies for Information Agents,” 2002, Computer Science Department, University College Dublin.
Kushmerick, N., “Finite-State Approaches to Web Information Extraction,” 2002, Computer Science Department, University College Dublin.
Kushmerick, N., “Wrapper Induction: Efficiency and Expressiveness,” Artificial Intelligence, 2000, pp. 15-68, 118, Elsevier Science B.V.
Laender, A. et al., “A Brief Survey of Web Data Extraction Tools,” 2002, Department of Computer Science, Federal University of Minas Gerais, Belo Horizonte MG Brazil.
Muslea, I. et al., “Hierarchical Wrapper Induction for Semistructured Information Sources,” 1999, pp. 1-27, Kluwer Academic Publishers, the Netherlands.
Sherman, C., “Yahoo! Launches New Product Search,” Sep. 23, 2003, SearchEngineWatch, [online] [Retrieved on Sep. 1, 2006] Retrieved from the Internet :http://searchenginewatch.com/showPage.html?page=3081551>.
White, M. et al., “Multidocument Summarization via Information Extraction,” First International Conference on Human Language Technology Research (HLT), 2001.
Yahoo Shopping web page, as provided by Internet Archive Wayback Machine at http://web.archive.org/web/20030101-20030922re—/http://shopping.yahoo.com/, as published between Jan. 1, 2003 and Sep. 22, 2003.
Newegg.com, Information from Web Archive.org at http://Web.archive.org/web/20020925093014/http:/
ewegg.com/, Sep. 25, 2002.
Nevill-Manning Craig
Witten Ian
Fenwick & West LLP
Google Inc.
Mizrahi Diane
LandOfFree
Systems and methods for information extraction does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Systems and methods for information extraction, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Systems and methods for information extraction will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-4215752