Method and system for classifying and locating media content

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C707S793000

Reexamination Certificate

active

06735583

ABSTRACT:

TECHNICAL FIELD
The present invention relates to classifying and locating media and, in particular, to computer software that classifies and locates media content units using a structured vocabulary.
BACKGROUND OF THE INVENTION
With the advent of larger storage capacities, computer systems have become more frequently used for storing huge collections of data, such as collections of multimedia files stored, for example, on multi-gigabyte disk drives. The larger and more diverse the data collection, the more difficult it becomes to deterministically retrieve a particular requested item in an efficient manner. A user is typically presented with a search request dialog, which requests the user supply a search request containing one or more words, optionally connected by Boolean expressions. Sometimes, keyword-value pairs are used to further clarify the search. For example, a user might search a set of files for all files containing the words “video” and “device driver,” where the “author” field equals “Hemingway.”
Various search and retrieval techniques have been employed to make the search and retrieval process more deterministic or efficient. For example, in the field of document retrieval, a vocabulary for describing documents has been employed, typically according to characteristics of the language itself. Such a system operates much like an index of a book. For example, a description language is derived based upon the frequency of occurrence of various words in the language and the juxtaposition statistics of these words (i.e., which words tend to appear together). This description language is then used to group various documents and to later retrieve them.
As another example, a system that maintains collections of clipart images may use an arbitrary set of words to characterize each image in the collection. When a user subsequently requests the retrieval of images, the user guesses at what terms were used in the classification process, or is instead presented with a fixed list, such as a list of categories. For example, a user might request the system to locate all images having to do with “balloons.” The success of the search is directly dependent upon how many and which images had been associated by the system with the word “balloon.” Since the choice of the words used by the system to characterize the images is arbitrary, the user's rate of success at picking the same words to describe the same images is somewhat random.
Other variations include using a caption or English sentence to describe an image. These techniques handle images in a similar manner to the document retrieval mechanisms. One difficulty with these techniques is that the captions are rarely well-formed sentences, and document retrieval system techniques are typically based upon tuning parameters that apply to well-formed sentences (e.g., frequency and juxtaposition).
Several image evaluation systems have employed recognition-based techniques for describing or classifying images. Some of these systems have used computational analysis of the colors, edges, shapes, and textures of the stored images to derive the subject matter of the image. For example, from a description of the shape of an object, a “stick-figure” model is constructed, which is then compared to objects in target images. Because different objects may reduce to similar stick figures, e.g., a pile of sticks and an actual picture of a horse may result in similar stick figures, these systems are prone to error. Also, stick-figure models typically do not contain sufficient detail to distinguish between objects with slight variation, e.g., an oak tree versus an elm tree. In addition, these recognition-based systems are computationally intensive.
SUMMARY OF THE INVENTION
Embodiments of the present invention provide methods and systems for creating and maintaining a dynamically modifiable hierarchical vocabulary system that is used to classify and locate desired units of media content in a deterministic and efficient manner. Exemplary embodiments include a structured vocabulary-driven media classification and search system (an “MCSS”). The MCSS is connected through a communications medium to one or more computer systems that provide, locate, request, and/or receive media content units. The MCSS creates and maintains a structured vocabulary representation, associates media content units with terms from the structured vocabulary, stores data to maintain those associations, and locates media content units upon request when a term previously associated with the media content units is submitted as search criteria.
The media classification and search system includes media content unit classification services, a search engine, and several data repositories. The data repositories store the media content units, the structured vocabulary, metadata associated with the content, which includes terms from the structured vocabulary that are used to characterize each media content unit, and a reverse index from the characterization terms that are stored in the metadata to the media content units associated with those terms.
In one embodiment, the structured vocabulary is stored as a hierarchical data structure of vocabulary units, where each unit represents a term and its relationship to other terms. Each vocabulary unit preferably contains a term identifier, which is independent of the value of the term, which can be used to uniquely refer to that vocabulary unit. Preferably, a first term that is subordinate to a second term indicates that the first term is a more specific classification or characterization of the second term. In one embodiment, the first term is a species of the genus characterized by the second term.
In yet another embodiment, further classifications and alterations of the vocabulary preferably can be performed without effecting media content that has been already classified using vocabulary terms. This enables the vocabulary to be refined over time without reprocessing or reclassifying the media content.
In another embodiment, each vocabulary unit represents a term in a default language and equivalent terms in one or more other languages, whose meaning is similar to the term in the default language. In addition, each term in each language is further associated with a list of synonym terms which, for classification purposes, are considered to have similar or the same meaning. Synonym terms may also be used to handle input errors. That is, common typographical errors that might occur in generating search criteria for a particular vocabulary term can be stored in a synonym list of the vocabulary unit that corresponds to that term, thus allowing the search engine to “recognize” the misspelling.
In one embodiment, media content units can be located using terms from the structured vocabulary in a different language than the language originally used to classify those media content units.
In another embodiment, terms can be translated to different languages to enhance the structured vocabulary without effecting the media content units already classified using the structured vocabulary.
In one embodiment, the search engine aids in the process of disambiguating terms, where a particular term in a search request matches several different terms in the structured vocabulary. Typically, this situation occurs when a particular term has more than one meaning. In one embodiment, the search engine presents a set of options for the requester to choose between. In another embodiment, the search engine presents all of the possible meanings for the term. In yet another embodiment, the disambiguating process can be enabled and disabled.


REFERENCES:
patent: 4829453 (1989-05-01), Katsuta et al.
patent: 4907188 (1990-03-01), Suzuki et al.
patent: 5220648 (1993-06-01), Sato
patent: 5404507 (1995-04-01), Bohm et al.
patent: 5553277 (1996-09-01), Hirano et al.
patent: 5761655 (1998-06-01), Hoffman
patent: 5802361 (1998-09-01), Wang et al.
patent: 5963940 (1999-10-01), Liddy et al.
patent: 5978804 (1999-11-01), Dietzman
patent: 6311194 (2001-10-01), Sheth et al.
pat

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and system for classifying and locating media content does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and system for classifying and locating media content, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and system for classifying and locating media content will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3263111

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.