Information type identification method and apparatus, e.g....

Music – Instruments – Electrical musical tone generation

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C084S645000, C705S001100, C709S241000

Reexamination Certificate

active

06794566

ABSTRACT:

The present invention relates to a method and apparatus for automatically identifying at least one specific type of information contained in a data sequence. The data sequence can correspond e.g. to the characters forming a file name attributed to a music file or other form of computer file. In the case of a music file (sometimes also known as an audio file), the specific type of information in question can be an artist name and/or a music title contained in the character sequence forming the file name.
Such an automatic information identification can be used for managing large sets of music files located on personal storage medium, such as hard disks, CD roms DVD roms, minidisks, etc.). The information thus extracted can be used in various applications in areas of sorting, archiving, computer assisted music title compilation and playlist generation, etc.
A music file is generally a data module containing binary data that encodes recorded music corresponding to a music title. The data can be read from the file and processed to produce an audio output exploitable by a computer or suitable sound reproduction system. Music files are generally handled and managed like other computer files, and have arbitrarily chosen file names which serve to indicate the associated audio content, usually a music title and artist. For instance the file name can be made to indicate the artist and the song or album corresponding to the audio contents. The audio file will typically also have an extension (part appearing just after a dot) indicating the music format, normally a compression protocol such as mp3, wav, or the like. File names can be given by music distributors, or by end users who create their own audio files.
There is nowadays a rapidly growing number of users who create and store vast collections of such audio files (over one thousand) on personal storage medium, typically computer hard disks and writable CDs. The music files of a collection can have different origins: personal CD collections, files downloaded from internet sites, such as those which sell music titles online, CDDB, radio recordings, etc.
At present, there is no standardised format for naming files, either in terms of syntax or in terms of artist name and title. In particular, users are normally confronted with disparate titling formats in which the order and form of the identification information can vary from one title to another. This lack of uniformity is clearly apparent when consulting lists of audio files presented at random from different users, e.g. in the internet sites which sell music titles online.
Some recording formats such as mp3 include so-called metadata which serves to identify the artist and title, but again no set rule is established stating how that information is to be organised. Likewise, there is no universal coding system for artist names or songs or track titles. For example, the pop group “The Beatles” will appear in some catalogues under “Beatles”, while in others under “The Beatles”, or again “Beatles, The”. Similarly, the lack of universal coding of music title file names is also a source of problem, especially when dealing with lengthy and complex title names. In particular, there is no rule regarding the order of mention of the artist and music title in a file name.
There then arises a problem of distinguishing the artist from the music title contained in a music file name, starting from the fact that the file name can be expected to contain that information in some form, possibly with abbreviations.
This distinguishing task is normally easy for a human being, whose cognitive and thinking processes are well suited to such recognition and sorting tasks. Nevertheless, it quickly become tedious when having to manage vast collections of audio files e.g. of over a thousand titles and possibly much more.
Moreover, a manual identification does not in itself allow the useful information to be passed on to a music title management system without some additional human intervention. Such a manual approach would thus defeat the object of creating a fully automated and flexible system.
In view of the foregoing, a first object of the invention is to provide a method of automatically identifying in a set of data sequences at least one specific type of information contained in each data sequence of the set, wherein the type of information has an unknown presentation in the data sequences, characterised in that it comprises the steps of:
initially defining at least one characteristic feature of the specific type of information, and of expressing the characteristic feature(s) in terms of at least one recognition rule executable by processor means,
applying the recognition rule(s) through the processor means to analyse the set of data sequences,
determining in each data sequence a data portion thereof satisfying the recognition rule(s), and
identifying the data portion as corresponding to the specific type of information.
It can be appreciated that the invention effectively forms an automated means for extracting items of information from a source in which those items are not expressed in a rigorous manner, or are presented in a manner which is not known a priori at the level of means performing the automatic identification. In this respect, the invention can be seen as a means for extracting features or rules from a system of information where those features or rules are not identified or labelled by that system.
Thus, in the context of names attributed to music files, the invention makes it possible to recognise automatically an artist name and a music title when these items of information are not expressed in the filename with rigour or according to a universal protocol.
The determining step can comprise a sub-step of picking out from the data sequence different data portions corresponding to respective types of information and applying the recognition rule(s) on each the picked out data portion.
One recognition rule can instruct to identify the specific type of information in terms of frequency of occurrence of a data portion over the set of data sequences.
Thus, the determining step further comprises the sub-steps of:
determining relative positions of the different data portions within a data sequence,
comparing, over the set of data sequences, data portions occupying the same relative position in the data sequence, and
determining from the comparison the relative position where there is the greatest occurrence of identical data portions over the set of data sequences,
and wherein the step then involves identifying the data portion located at the relative position of greatest occurrence as corresponding to the specific type of information.
Another recognition rule can instruct to identify the specific type of information type in terms of the size of a data portion of the data sequence, and/or instruct to identify the type of information type in terms of a relative position of a data portion in the data sequence.
The determining step can comprise the following sub-steps, applied to at least some of the data sequences of the set:
determining a candidate data portion in a data sequence, and
comparing the candidate data portion against a stored set of data portions known to correspond to the specific type of information to be identified,
wherein the identifying step involves identifying the data portion found to be present in the data base as corresponding to the specific type of information.
There can be provided a step, prior to the determining step, of normalising the data sequence by removing from the data sequence data not susceptible of being contained in the specific type of information to be identified.
There can also be provided a step, prior to the determining step, of identifying in the data sequence separator data separating different data portions therein, by reference to a stored set of possible separator characters.
The data sequence corresponds to characters forming a file name of a computer file.
In the embodiment, the set of data sequences corresponds to a respective set of file names o

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Information type identification method and apparatus, e.g.... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Information type identification method and apparatus, e.g...., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Information type identification method and apparatus, e.g.... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3244327

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.