System and method for automatic preparation of data...

Image analysis – Applications – Personnel identification

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Reexamination Certificate

active

06810136

ABSTRACT:

FIELD OF THE INVENTION
The present invention relates to a system and a method for the automatic preparation and searching of microfilm-type materials, particularly for newspapers and magazines stored on microfilm or microfiche, the conversion of those documents to a digital format and storage of the information contained therein in searchable repositories.
BACKGROUND OF THE INVENTION
As the Internet grows, many different types of Web sites are becoming connected and therefore are available to users. These Web sites may contain information which is of interest to users, such as news for example. Indeed, many Internet users today obtain at least a portion of their news information from Web sites which publish such information.
Traditional newspapers and other sources of news have therefore been forced to embrace the new media which is represented by Web pages. Currently, many traditional (print) newspapers have Web sites which contain at least a portion of the news and information which is available through the print version of the newspaper. However, archived newspaper and magazine material, which is currently stored in microfilm, is not so readily accessible for publication through the Internet or any other type of network. Newspaper publishers, libraries and other repositories have huge amounts of information which is stored on microfilm. Such microfilm documents represent a huge asset, which cannot currently be properly used. The advantage of microfilm is that it preserves the appearance of the newspaper, magazine or other paper document, as well as the data contained therein. The disadvantage, of course, is that searching through microfilm archives for the information of interest is tedious and difficult. Furthermore, microfilm can only be read at one physical location, since the data cannot be transmitted over a network, for example. Thus, microfilm has a number of significant problems.
Attempts to provide a solution unfortunately have a number of drawbacks. For example, scanning the microfilm documents in order to be able to provide the data through a computer results in a number of errors during the process of OCR (optical character recognition). This process is required for the textual data to be electronically searchable; however, the resultant errors cause the final text to be difficult to search accurately. Correcting these errors manually is a tedious and expensive process, yet currently if these errors are not corrected, the resultant text may not be searchable.
A further attempt to provide searches for text with errors is the “fuzzy search” process, in which a requested keyword and variations on that keyword are all searched simultaneously. Unfortunately, this search method is ineffective for large databases, since too many irrelevant hits are retrieved.
A more useful and efficient system for the automatic preparation and searching of scanned documents is disclosed in PCT Application No. IL01/00797m, by the present inventors and incorporated by reference as if fully set forth herein. In the disclosed system the probability of errors occurring during the preparation of the scanned documents is incorporated into the searching process.
An even more useful solution would provide a complete system for the automatic preparation of a repository of searchable files from archived material. Furthermore such a solution should also be cost effective, operate at least semi-automatically, and also permit access to archived material, and in particular microfilm documents, through an electronic interface. Unfortunately, such a solution is not currently available.
SUMMARY OF THE INVENTION
The background does not teach or suggest a system or a method for automating the conversion of microfilm data to a digital format, and the creation of searchable data repositories from the converted digital data. The background art also does not teach or suggest a system and method for enabling users to access the data repositories through a network such as the Internet. The background art also does not teach or suggest a cost effective, at least semi-automatic method for converting microfilm data to a form which can be readily accessed through an electronic interface.
The present invention overcomes these deficiencies of the background art by providing a system and a method for automatically converting microfilm data in to repositories of data in a digital format which may be easily accessed by a user across a network such as the internet. First, preferably a planning phase is performed, in which the production parameters are set depending on a number of conditions such as the nature of the material and the requirements of the customer. Next, preferably data from scanned microfilm reels goes through a preparation phase in which the scanned reels are subdivided. For example a microfilm reel of a newspaper would be subdivided into one or more issues each of which would be saved in a separate data file. Once the files are extracted from the reel, a profile is preferably prepared and jobs are generated. Each file is preferably assigned its own job. The “Automatic Processing” phase executes the generated jobs. As a result every file optionally and preferably undergoes the following automatic processing stages: combining files; analyzing image layout; segmentation; OCR; optional segmentation improvement; and output to XML. In the last stage, the data contained in the files is preferably extracted and then more preferably transmitted to the relevant repository unit.
According to more preferred embodiments of the present invention the system is capable of managing more than one conversion project at any one time, with each project containing one or more publications. Each publication is preferably divided to one or more collections and a search index will be produced for each collection in order to enable accessibility of archived issues, through the use of such search indexes Hereinafter, the term “network” refers to a connection between any two or more computational devices which permits the transmission of data.
Hereinafter, the term “computational device” includes, but is not limited to, any type of computer operating according to any type of hardware and/or operating systems; or any device, including but not limited to: laptops, hand-held computers, PDA (personal data assistant) devices, cellular telephones, any type of WAP (wireless application protocol) enabled device, wearable computers of any sort, or any other device which has an operating system.
For the present invention, a software application could be written in substantially any suitable programming language, which could easily be selected by one of ordinary skill in the art. The programming language chosen should be compatible with the computational device according to which the software application is executed. Examples of suitable programming languages include, but are not limited to, C, C++ and Java.
In addition, the present invention could be implemented as software, firmware or hardware, or as a combination thereof. For any of these implementations, the functional steps performed by the method could be described as a plurality of instructions performed by a data processor.
Hereinafter, the term “Web browser” refers to any software program which can display text, graphics, or both, from Web pages on World Wide Web sites. Hereinafter, the term “Web server” refers to a server capable of transmitting a Web page to the Web browser upon request.
Hereinafter, the term “Web page” refers to any document written in a mark-up language including, but not limited to, HTML (hypertext mark-up language) or VRML (virtual reality modeling language), dynamic HTML, XML (extensible mark-up language) or XSL (XML styling language), or related computer languages thereof, as well as to any collection of such documents reachable through one specific Internet address or at one specific World Wide Web site, or any document obtainable through a particular URL (Uniform Resource Locator). Hereinafter, the term “Web site” refers to at least one Web page

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

System and method for automatic preparation of data... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with System and method for automatic preparation of data..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method for automatic preparation of data... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3290339

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.