Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission
Reexamination Certificate
2000-05-22
2003-01-07
Chawan, Vijay (Department: 2654)
Data processing: speech signal processing, linguistics, language
Speech signal processing
For storage or transmission
C704S215000, C704S270000, C348S462000
Reexamination Certificate
active
06505153
ABSTRACT:
BACKGROUND OF THE INVENTION
Many challenges exist in the efficient production of closed captions, or, more generally, time-aligned transcripts. Closed captions are the textual transcriptions of the audio track of a television program, and they are similar to subtitles for a movie show. A closed caption (or CC) is typically a triplet of (sentence, time value and duration). The time value is used to decide when to display the closed caption on the screen, and the duration is used to determine when to remove it. Closed captions are either produced off-line or on-line. Off-line closed captions are edited and aligned precisely with respect to time by an operator in order to appear on the screen at the precise moment the words are spoken. On-line closed captions are generated live, during television newscasts for instance.
Captions can be displayed on the screen in different styles: pop on, roll-up or paint-on. Pop-on closed captions appear and disappear at once. Because they require precise timing, they are created post-production of the program. Roll-up closed captions scroll up within a window of three or four lines. This style is typically used for live broadcasts, like news. In that case, an operator who uses a stenotype keyboard enters the caption content live. The paint-on captions have a similar style to pop-on captions, except they are painted on top of the existing captions, one character at a time.
Captioning a video program is a costly and time-consuming process which costs approximately $1,000 per hour. That includes the whole service from transcription, time alignments and text editing to make the captions comfortable to read.
The number of closed-captioned programs increased dramatically in the United States because of new federal laws:
The landmark Americans with Disabilities Act (or ADA) of 1992 makes broadcasts accessible to the deaf and hard-of-hearing;
The FCC Order #
97-279
requires that 95% of all new broadcast programs be closed captioned by 2006.
The TV Decoder Circuitry Act which imposes all televisions 13 inches or larger for sale in the United States to have a closed caption decoder built in.
In several other countries, legislation requires television programs to be captioned. On the other hand, digital video disks (DVD) have multi-lingual versions and often require subtitles in more than one language for the same movie. Because of the recent changes in legislation and new support for video, the demand for captioning and subtitling has increased tremendously.
The current systems used to produce closed captions are fairly primitive. They mostly focus on formatting the text into captions, synchronizing them with the video and encoding the final videotape. The text has to be transcribed first, or at best imported from an existing file. This is done in one of several ways: the typist can use a PC with a standard keyboard or stenotype keyboard such as those used by court reporters. At this point of the process, the timing information has been lost and must be rebuilt. Then the closed captions are made from the transcription by splitting the text manually in a word processor. This segmentation can be based on the punctuation, or is determined by the operator. At that point, breaks do not make any assumption on how the text has been spoken unless the operator listens to the tape while proceeding. The closed captions are then positioned on the screen and their style (italics, colors, uppercase, etc.) is defined. They may appear at different locations depending on what is already on the screen. Then the captions are synchronized with the audio. The operator plays the video and hits a key as soon as the first word of the caption has been spoken. At last, the captions are encoded on the videotape using a caption encoder.
In summary, the current industry systems work as follows:
Import transcription from word processor or use built-in word processor to input text;
Break lines manually to delimit closed captions;
Position captions on screen and define their style,
Time mark the closed captions manually while the audio track is playing;
Generate the final captioned videotape.
Thus, improvements are desired.
SUMMARY OF THE INVENTION
The present invention provides an efficient system for producing off-line closed captions (i.e., time-aligned transcriptions of a source audio track). Generally, the invention process includes:
1. classifying the audio and selecting spoken parts only, generating non-spoken captions if required;
2. transcribing the spoken parts of the audio track by using an audio rate control method;
3. adding time marks to the transcription text using time of event keystrokes;
4. re-aligning precisely the transcription on the original audio track; and
5. segmenting transcription text into closed captions.
REFERENCES:
patent: 3553372 (1971-01-01), Wright
patent: 4841387 (1989-06-01), Rindfuss
patent: 4924387 (1990-05-01), Jeppesen
patent: 5564005 (1996-10-01), Weber et al.
patent: 5649060 (1997-07-01), Ellozy et al.
patent: 5737725 (1998-04-01), Case
patent: 5748499 (1998-05-01), Trueblood
patent: 5793948 (1998-08-01), Asahi et al.
patent: 5828994 (1998-10-01), Covell et al.
patent: 5835667 (1998-11-01), Wactlar et al.
patent: 6023675 (2000-02-01), Bennett et al.
patent: 6076059 (2000-06-01), Glickman et al.
patent: 6161087 (2000-12-01), Wightman et al.
patent: 6181351 (2001-01-01), Merrill et al.
patent: 6185329 (2001-02-01), Zhang et al.
patent: 6205420 (2001-03-01), Takagi et al.
patent: 6260011 (2001-07-01), Heckerman et al.
patent: 6263507 (2001-07-01), Ahmad et al.
Covell, M., et al., “MACH1: Nonuniform Time-Scale Modification of Speech,”Proceedings of the 1998 IEEE International Conference on Acoustics, Speech, and Signal Processing,vol. 1: 349-352 (1998).
Hauptmann, A.G. and M.J. Withrock, “Story Segmentation and Detection of Commercials in Broadcast News Video,”ADL-98 Advances in Digital Libraries Conference,12 pp. (Apr. 1998).
Robert-Ribes, J., “On the use of automatic speech recognition for TV captioning,” inProceedings, ICSLP,1998.
Robert-Ribes, J. and R.G. Mukhtar, “Automatic Generation of Hyperlinks between Audio and Transcript,” inProceedings, EuroSpeech,1997.
Moreno, P.J., et al., “A Recursive Algorithm for the Forced Alignment of Very Long Audio Segments,” inProceedings, ICSLP,1998.
Qureshi, S.U.H., “Speech Compression by Computer,” inTime-Compressed Speech,S. Duker, ed., Scarecrow, 1974 (pp. 618-623).
Siegler, M.A. et al., “On the Effects of Speech Rate in Large Vocabulary Speech Recognition Systems,”Proc. ICASSP,May 1995.
Campbell, W.N., “Extracting Speech-Rate Values from a Real-Speech Database,”Proc. ICASSP,Apr. 1988.
Miller, G.A., et al., “The intelligibility of interrupted speech,”Journal of the Acoustic Society of America 22(2):167-173, 1950.
David, E.E. et al., “Note on pitch-synchronous processing of speech,”Journal of the Acoustic Society of America,28(7):1261-1266, 1965.
Neuberg, E.E., “Simple pitch-dependent algorithm for high quality speech rate changing,”Journal of the Acoustic Society of America,63(2):624-625, 1978.
Roucos, S., et al., “High quality time-scale modification for speech,”Proc. of the International Conference on Acoustics, Speech and Signal Processing,pp. 493-496, IEEE, 1985.
Malah, D., “Time-domain algorithms for harmonic bandwidth reduction and time scaling of speech signals,”IEEE Transactions on Acoustics, Speech, and Signal Processing,ASSP -27(2):121-133, Apr. 1979.
Portnoff, M.R., “Time-scale modification of speech based on short-time Fourier analysis,”IEEE Transactions on Acoustics, Speech and Signal Processing,ASSP-29(3):374-390, Jun. 1981.
Dolson, M., “The phase vocoder: A tutorial,”Computer Music Journal 10(4):14-27, 1986.
Wold, E., et al., “Content-Based Classification, Search, and Retrieval of Audio,”IEEE Multimedia,3 (3), 1996.
Miedema, H. et al., “TASI quality—Effect of Speech Detectors and Interpolators,”The Bell System Technical Journal,pp. 1455-1473 (1962).
Hejna, D.J., Jr., “Real-Time Time-Scale Modification of Speech via the Synchronized Overlap-Add Algorithm,” unpublished master's th
Logan Beth
Swain Michael
Thong Jean-Manuel Van
Chawan Vijay
Compaq Information Technologies Group L.P.
Hamilton Brook Smith & Reynolds P.C.
Lerner Martin
LandOfFree
Efficient method for producing off-line closed captions does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Efficient method for producing off-line closed captions, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Efficient method for producing off-line closed captions will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3072145