Method for searching multiple file types on a CD ROM

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C715S252000, C715S252000

Reexamination Certificate

active

06654758

ABSTRACT:

FIELD OF THE INVENTION
The present disclosure involves methods for developing full text searches for searching multiple file types which are distributed on a CD CACHE ROM.
BACKGROUND OF THE INVENTION
In present day commercial situations, many digital development software and computer companies work to deliver documentation to their customers in a number of different formats. These formats may show up in a number of different varieties, that is to say the document format may be on paper, for example, or Adobe Acrobat Portable Document Format (PDF) files, or Windows Help files, or Hypertext Markup Language (HTML) and also HTML help files.
The documentation provided to receivers, such as customers, is distributed and made available on, for example, paper documents, on CD ROMs, and on Web Servers.
Of course, it is desirable for a recipient or user to make a full text search of the received documents. However, users cannot perform full-text searches on paper documents, except through long, laborious reading and surveys of the documents. There is, however, software designated as “search engines” that exist in digital technology in order to search files that are distributed on CD ROMs.
However, these search engines are limited in a number of ways in providing search capability when the document or CD ROM involves multiple file types. Most of the existing search engines are designed only to search files of one particular format.
In this type of situation, then it would be necessary to convert all files in the documentation or CD ROM into a common format. This common format would be the format which was compatible with the particular search engine available.
However, when files are converted into a format different from that in which they were originally created, much of the functionality for searching the original file is lost, and this includes navigating through the file and finding certain content in the file.
There are other types of search engines which are capable in a certain limited way of including search operations for multiple file types in the documentation or CD ROM. However, these are unable to open all the file types at locations where the search terms appear and then be capable of moving from one such location to the next location within the document.
Thus, these other types of search engines require that the user first search with one particularly favorite engine and then refine the search using another search engine designed for the file type.
One example of a standard (not a full-text) search is what one can do in a product such as Word. The operator tells Word to find a text string. Then Word starts reading the text in the document by reading each word one at a time beginning at a specified location and comparing the text against the string that was entered. Now, when Word finds a “hit” (match), then Word highlights the text and stops searching. If the operator chooses “Find Next” option, then the Word program repeats the process and continues the search beginning just past the current hit. However, this is considered pretty much of a brute force and slow process of operation.
A “full text” search, however, works to search a collection of files at one time. It accomplishes this by using an auxiliary collection of files that was created ahead of time and then distributed with the files that are to be searched. If, for example, the operator wished to search 450 files for the word “server,” the software would then read the auxiliary files which will already know all occurrences and locations of the word “server.” Here the software would present the operator with a “hit list” of all files that contained the word that is built from the information in the auxiliary files. If the operator elects to open up any of these files, the software will then open the file, move to the first location in the file (which it already knows from the auxiliary file), and then highlight the word. It may be noted that none of the files are directly searched or scanned. By using such a file, the operator or user can utilize advanced features such as wild cards (“install*”) and Boolean operators (“installation and not printers”).
There are a number of ways to create these auxiliary files. Such a process may take several hours for most of releases to be made on CD-ROM. The success of a “search engine” can be measured by how efficiently the desired files are generated and accessed.
The present invention provides for the use of an existing search engine that is designed to support the searching of one particular file format (PDF, or Adobe® Acrobat® files). This can then be extended to allow the searching of virtually any other type of file format such as HTML, HTML Help, or Windows Help. The method and system accomplishes this by creating a PDF file “duplicate” consisting of the text from the file that the operator wants to search in order to allow the search engine to find the text in the duplicate that was created. Here then there is provided a link from each page in the PDF duplicate into the corresponding location in the file of the other format so that the user-operator has now essentially performed a full-text search in that file.
SUMMARY OF THE INVENTION
The present method and system involves a technique that is used to search the Portable Document Format (PDF) files that contain the text extracted from files residing in other formats such as Windows Help, Hypertext Markup Language (HTML) Help, and HTML.
On each page of the PDF file there are hyperlinks that the user can select to open the original file at the corresponding location.
The method enables the user to search the collection of PDF files, including both files that were created as PDF files as well as the PDF files created from the text extracted from the files of other formats. The method uses the search engine from Verity that is distributed by Adobe® in order to search the Adobe® Acrobat® portable document format files on a CD ROM. If the search targets include files of formats other than PDF, then the user is presented with pages within the PDF copy of the file in which the target text appears.
The user can navigate within the PDF copy using the “next hit” and “previous hit” program options. The text is visible to the user and is sufficient to help the user determine whether it is necessary or helpful to access the original file.
Each page of the PDF file carries a “button” then, when selected, opens the document in the original format at the location corresponding to the location displayed in the PDF copy. Both the PDF copy and the original file are accessible at the same time so it is possible to identify the location of the hits within the file and to find additional hits in the complete collection of files.
The indicated method includes software which is used to extract the text from Windows Help, HTML, and HTML Help files, and then create from that text the new files that can be converted by the standard Adobe software into PDF files with corresponding explanatory messages and buttons on every page in order to support the linking into the corresponding locations within the original files.
This method then provides the ability to link from the hits displayed in Adobe Acrobat into the corresponding locations within the original files.


REFERENCES:
patent: 6336124 (2002-01-01), Alam et al.
patent: 6393442 (2002-05-01), Cromarty et al.
patent: 6415278 (2002-07-01), Sweet et al.
patent: 6415307 (2002-07-01), Jones et al.
“Hypertext Document Update”, IBM Technical Disclosure Bulletin, Jan. 1, 1992.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method for searching multiple file types on a CD ROM does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method for searching multiple file types on a CD ROM, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method for searching multiple file types on a CD ROM will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3181233

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.