Computer method and apparatus for extracting data from web...

Data processing: speech signal processing – linguistics – language – Linguistics – Translation machine

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S009000, C704S010000, C707S793000, C707S793000, C707S793000

Reexamination Certificate

active

07065483

ABSTRACT:
Computer method and apparatus for extracting information from a Web page is disclosed. The invention apparatus is formed of an extractor coupled to receive Web pages from a source. The extractor uses natural language processing to extract desired information from the Web page. A storage subsystem receives from the extractor the extracted desired information and stores the extracted desired information in a database. The invention method for extracting data from a Web page includes the computer implemented steps of (i) using natural language processing, finding possible formal names on a given Web page, (ii) using pattern matching, searching the given Web page for formal names not found by the natural language processing, and (iii) refining a combined set of the found formal names to produce a working set of people and organization names extracted from the given Web page. The refining includes determining aliases of respective people and organization names, so as to effectively reduce duplicate names.

REFERENCES:
patent: 4270182 (1981-05-01), Asija
patent: 5319777 (1994-06-01), Perez
patent: 5764906 (1998-06-01), Edelstein et al.
patent: 5813006 (1998-09-01), Polnerow et al.
patent: 5835905 (1998-11-01), Pirolli et al.
patent: 5895470 (1999-04-01), Pirolli et al.
patent: 5918236 (1999-06-01), Wical
patent: 5923850 (1999-07-01), Barroux
patent: 5924090 (1999-07-01), Krellenstein
patent: 6052693 (2000-04-01), Smith et al.
patent: 6065016 (2000-05-01), Stuntebeck et al.
patent: 6076088 (2000-06-01), Paik et al.
patent: 6094653 (2000-07-01), Li et al.
patent: 6112203 (2000-08-01), Bharat et al.
patent: 6122647 (2000-09-01), Horowitz et al.
patent: 6128613 (2000-10-01), Wong et al.
patent: 6212552 (2001-04-01), Biliris et al.
patent: 6253198 (2001-06-01), Perkins
patent: 6260033 (2001-07-01), Tatsuoka
patent: 6266664 (2001-07-01), Russell-Falla et al.
patent: 6269369 (2001-07-01), Robertson
patent: 6301614 (2001-10-01), Najork et al.
patent: 6314409 (2001-11-01), Schneck et al.
patent: 6336108 (2002-01-01), Thiesson et al.
patent: 6336139 (2002-01-01), Feridun et al.
patent: 6349309 (2002-02-01), Aggarwal et al.
patent: 6377936 (2002-04-01), Henrick et al.
patent: 6389436 (2002-05-01), Chakrabarti et al.
patent: 6418432 (2002-07-01), Cohen et al.
patent: 6463430 (2002-10-01), Brady et al.
patent: 6466940 (2002-10-01), Mills
patent: 6493703 (2002-12-01), Knight et al.
patent: 6529891 (2003-03-01), Heckerman
patent: 6553364 (2003-04-01), Wu
patent: 6556964 (2003-04-01), Haug et al.
patent: 6601026 (2003-07-01), Appelt et al.
patent: 6618717 (2003-09-01), Karadimitriou et al.
patent: 6640224 (2003-10-01), Chakrabarti
patent: 6654768 (2003-11-01), Celik
patent: 6668256 (2003-12-01), Lynch
patent: 6675162 (2004-01-01), Russell-Falla et al.
patent: 6697793 (2004-02-01), McGreevy
patent: 6745161 (2004-06-01), Arnold et al.
patent: 6859797 (2005-02-01), Skopicki
patent: 2001/0009017 (2001-07-01), Biliris et al.
patent: 2003/0221163 (2003-11-01), Glover et al.
patent: 2003/0225763 (2003-12-01), Guilak et al.
patent: A-53031/98 (1998-08-01), None
patent: A-53031-98 (1998-08-01), None
patent: A 53031/98 (1998-08-01), None
patent: 10-320315 (1998-12-01), None
patent: WO 99/67728 (1999-12-01), None
patent: WO 00/33216 (2000-06-01), None
ABCNEWS.com, Apr. 28, 1999. http://web.archive.org/web/19990428185649/abcnews.go.com/.
COMPAQ, Apr. 22, 1999. http://web/archive.org/web/19990422222242/www.compaq.com/.
Dwi H. Widyantoro, Thomas R. Ioerger, John Yen. “An Adaptive Algorithm for Learning Changes in User Interests”. Nov. 1999. ACM. p. 405-412.
Soumen Chakrabarti, Byron Dom, Piotr Indyk. “Enhanced hypertext categorization using hyperlinks”. 1998 ACM. pp. 307-318.
Sahami, M. et al., “SONIA: A Serive for Organizing Networked Information Autonomously,”3rd ACM Conference on Digital Libraries, Digital 98 Libraries, Jun. 23-26, 1998, pp. 200-209.
Nir Friedman, Moises Goldszmidt, “Building Classifiers using Bayesian Networks”. From Proceedings of the National Conference on Artificial Intelligence (AAAI96). pp. 1277-1284.
Lorrie Faith Cranor and Brian A. LaMacchia, “Spam!” Communications of the ACM, Aug. 1998. vol. 4, No. 8, pp. 74-83.
PCT International Search Report PCT/US01/22425.
PCT International Search Report PCT/US01/23343.
A.K. Jain et al. “Data Clustering: A Review.” ACM Computing Surveys, vol. 31, No. 3, Sep. 1999, pp. 264-323.
Hall, Robert J. “How to Avoid Unwanted Email.” Communications of the ACM, Mar. 1998. vol. 41, No. 3, pp. 88-95.
Pazzani, M. et al., “Learning from hotlists and coldlists: Towards a WWW information filtering and seeking agent,”Proc. International Conference on Tools with Artificial Intelligence, Los Alamitos, CA, 1994, pp. 492-495.
Lam, W. and K. Low, “Automatic Document Classification Based on Probabilistic Reasoning: Model and Performance Analysis,”1996 IEEE Conference on Computational Cybernetics and Simulation, Orlando, FL 1997, pp. 2719-2723.
PCT International Search Report PCT/US01/22385, Dec. 18, 2002 (4 pp).
PCT International Search Report PCT/US01/22430, Jan. 17, 2003, 4 pp.
PCT International Search Report PCT/US01/22381, Feb. 12, 2003, 3 pp.
PCT International Search Report PCT/US01/24162, Feb. 13, 2003, 4 pp.
Ball, T. and F. Douglis, “An Internet Difference Engine and its Applications,”Proceedsings of COMPCON '96, IEEE Comp. Soc. Press, Feb. 25, 1996, p. 71-76.
Freitag, D., “Machine Learning for Information Extraction in Informal Domains,”Machine Learning 39:2/3(169-202), May/Jun. 2000, p. 169-202.
Kjell, B., “Authorship Attribution of Text Samples Using Neural Networks and Bayesian Classifiers,”IEEE Int. Conf. on Systems, Man, and Cybernetics, vol. 2, Oct. 5, 1994, pp. 1660-1664.
Singhal, M., “Update Transport: A New Technique for Update Synchronization in Replicated Database Systems,”IEEE Transactions on Software Engineering 16:12(1325-1336), Dec. 1, 1990.
PCT International Search Report PCT/US01/41515, Feb. 28, 2003, 4 pp.
Langer, A. and J.S. Rosenschein, “Using Distributed Problem Solving to Search the Web,”Proc. 4th Int. Conf. on Autonomous Agents, ACM, USA, Jun. 3-7, 2000, pp. 197-198.
International Search Report PCT/US01/22426, Mar. 17, 2003, 4 pp.
Guan, T. and K-F Wong, “KPS: a Web information mining algorithm,”Computer Networks 31:11-16(1495-1507) May 17, 1999, Elsevier Science Publishers B.V., Amsterdam.
Miller, R.C. and K. Bharat; “SPHINX: a framework for creating personal, site specific Web crawlers,”Computer Networks and ISDN Systems, 30:1-7(119-130) Apr. 1, 1998, North Holland Publishing, Amsterdam.
Powell, T.A. et al.,HTML Programmer's Reference, (Appendices A and B), Osborne/McGraw-Hill, 1998 (pp. 355-377).
Miller, M., “The Complete Idiot's Guide to Online Search Secrets,”Que, 2000, pp. 172-179.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Computer method and apparatus for extracting data from web... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Computer method and apparatus for extracting data from web..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Computer method and apparatus for extracting data from web... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3705103

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.