Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition
Reexamination Certificate
1997-06-25
2002-12-03
Chawan, Vijay (Department: 2654)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Recognition
C704S243000, C704S277000, C704S235000, C704S231000
Reexamination Certificate
active
06490561
ABSTRACT:
BACKGROUND
The present invention relates generally to transcription methods and apparatus, and more particularly, to a continuous speech voice transcription method and apparatus for use in transcribing structured reports, such as radiology reports, and the like.
Transcription is a major bottleneck in timely radiology reporting, for example. Radiology images may be acquired, read, and dictated in a few minutes, but many days may pass until the transcription is complete. Similar problems occur in medicine, law and other areas of endeavor.
Report transcription has traditionally been a process that involves a number of people. In its most primitive form, the transcription is to a cassette that is collected and carried to the transcriptionist where the cassettes are put in an “in” basket. The transcriptionist sequentially processes incoming cassettes, transcribing and printing the reports, and sending them back to the radiologists for signing. If there are corrections, there is another cycle using the transcriptionist.
A more advanced form of transcription uses a communication network to record the voice report such that the transcriptionist can retrieve the recording directly at a workstation without transporting the physical cassette. The wait in a transcription queue can be several hours in an efficient hospital to several days for less efficient hospitals to generate a typical report.
Thus, if the transcription were performed automatically, the text report could be available at the end of the dictation with no waiting for several hours to several days for the transcriptionist to complete the transcription task. The computing horsepower of a computer workstation can be used to perform the automatic transcription. What is needed is an automatic transcription algorithm that can transcribe the dictated reports.
Electronic reports may be structured, such that either a fill in the blanks report or a full structured report are generated. The two variations are related. The structured report starts with a basic report for a pathology, such as mammography for example. In order to make the electronic report complete and like other reports, the American College of Radiology has established a mammography reporting form. The form is a basic report with areas that can be filled in with words that are selected from a list of words for that blank in that report. This “fill in the blanks” reporting makes all mammography reports very similar, using simple variations on the language for each of the individual reports.
The structured form of the fill in the blanks report is much more useful in computer processing to determine outcomes of the treatment. The processing performed by the computer can ignore filler text and process the data contained in the filled in blanks to generate the report. Some of the blanks describe the severity of the pathology. Other blanks describe the changes in the pathology as a result of whatever treatment has been performed. Over a period of time the progression of the contents of the blanks on the form show a picture of the progress of the patient. After a period of time, the outcome for the patient can be evaluated.
A yet more structured form of the electronic report is a collection of codes. The codes point to selected phrases in a dictionary of codes. The SNOMED dictionary maintained by the National Library of medicine is one such dictionary. This micro-glossary has words and phrases that are useful in describing a large number of pathologies including the location and severity of the pathology. To read the report requires converting the codes to text form. For a computer to read the codes is trivial, since the codes are an almost ideal representation of information for the computer. The outcomes must be assessed, and thus the report should be organized to make the assessment easy. As a result, structured reporting including the use of codes to describe the pathology and its progress, will be used more in the future.
The radiologist should be able to generate the report while looking at an image that is to be evaluated. With the attention on the image, the radiologist can progress through the image in an orderly fashion, making sure that all aspects of the diagnosis are properly covered in the report. The transcription should therefore be something that can be done while looking at the image that is diagnosed. The radiologist should not have to look at every word generated to make certain that the word is properly spelled, for example. While not as important a requirement, it would be beneficial if the radiologist could dictate the report without using hands. The radiologist should use his or her hands to manipulate images, change to historical images for comparison, and magnify selected areas of the images for detail.
A number of transcription devices and methods are currently available. They fall into several categories including isolated word recognition, continuous speech recognition, batch transcription after dictation, and on the fly transcription while dictating. Generally the transcription devices require a training cycle. A new user must train the system to recognize a vocabulary of words. The isolated word recognition devices use patterns of each individual word in performing the recognition. A typical training cycle requires one-half hour to several hours to say each of the words or phrases required for training.
The transcription devices are generally organized to recognize free text spoken by an individual. The transcription devices are advertised with a description of the number of tens of thousands of words that can be recognized by the system. These devices use a decision procedure that requires the recognition of isolated words from a very large vocabulary for a single individual. Many isolated words are short and easy to confuse with other words. When the vocabulary available is large, there is more possibility of confusion with other words in the vocabulary.
Isolated word recognition devices may be used to generate fill in the blanks reports. The blanks are filled with isolated words. If the words could be restricted to only the few words that are available for the particular blanks, the recognition problem would be very much easier and the performance much better. Similarly, structured reporting using codes could be performed effectively by isolated word recognition devices using small vocabularies. However, this has not been done in the past.
Batch transcription after dictation method uses dictated voice reports that are transcribed at a later time. However, this method is not desirable. The radiologist must review the transcribed text at a later time to determine that the information has been accurately transcribed. Time delays associated with the batch processing method also make the approach less desirable.
Continuous speech recognition devices are useful. When a radiologist does not have to speak each word as an isolated word, generation of the report can proceed much more quickly with less attention from the radiologist. However, while it is desirable, continuous speech recognition of free text is generally not performed. The problem is technically difficult. The usual result is transcribed text with many errors.
In view of the above, it is believed that a report transcription tool that generates reports using a limited number of sounds may be advantageously employed in a number of disciplines.
A number of patents relate to voice recognition, and the like. The patents may be grouped into those disclosing template matching, single word recognition, hidden Markov model, and subphrase recognition. Template matching is a generic approach. The cited patents typically have different measurements to match against templates of words. Single word recognition requires an easily recognized beginning and ending of a word. This technique requires the individual to use an artificial one-word-at-a-time speaking style. Hidden Markov model techniques use probabilistic techniques to determine the most probable next word in a sequence. The probabilistic mod
Wayman James L.
Wilson Dennis L.
Zugel John
LandOfFree
Continuous speech voice transcription does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Continuous speech voice transcription, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Continuous speech voice transcription will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2960049