System for editing digital video and audio information

Data processing: speech signal processing – linguistics – language – Speech signal processing – Application

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S235000, C704S276000

Reexamination Certificate

active

06185538

ABSTRACT:

BACKGROUND OF THE INVENTION
The invention relates to a system for editing digital video and audio information, comprising: a storage means for recording and reproducing video and audio information, an indicator means for indicating information, a means for realizing an edit decision list comprising editing data, and a control means for controlling the storage means in dependence upon the editing data of the edit decision list.
In television studios, non-linear editing systems are increasingly used for producing programs suitable to be broadcast. In such editing systems, the signals of the video and audio information are initially stored in an unarranged form in a random access memory, for example, a disc memory. Subsequently, given image scenes (takes) of the stored video and audio information can be non-linearly accessed, i.e. without any time delay. It is conventional practice to determine also the editing instants with reference to the video and audio information stored in the memory, i.e. the instants of starting and ending each take which are to be broadcast. The editing instants of the selected takes, as well as the sequence of the takes are entered on an Edit Decision List (EDL). The editing data entered on the EDL are used for controlling the (disc) memory. In conformity with the editing data of the EDL, a continuing video/audio sequence suitable to be broadcast can be read from the memory. In contrast to the linear (sequential) magnetic tape technique, such a non-linear editing system allows an on-line check of the video/audio sequence composed on the basis of an EDL. In a non-linear editing system, EDL entries can be changed easily.
The editing instants cannot only be determined by means of the displayed images of the video material but also by means of the accompanying sound. The audio information, which is provided time-parallel with the video material, is then monitored and editing marks are inserted at the edit starts and edit ends of given takes which are about to be broadcast. This type of editing is suitable for producing current news broadcasts because the editing instants of the takes which are about to be broadcast can be determined more rapidly and more exactly with the aid of the audio information than with the aid of the image information of the video material.
A system for computer-aided editing of audio data is known from WO 94/16443, in which a part of the amplitude variation of the audio data is displayed on a monitor. Sound or speech interruptions which may serve as editing instants for editing marks are determined by means of glitches in the amplitude curve. The realization and evaluation of amplitude curves with reference to audio data requires a lot of feeling and special expertise. Consequently, this system is not very appropriate for editing current (television) news contributions by editors who have not had a technical training.
SUMMARY OF THE INVENTION
It is an object of the present invention to provide a system for editing digital video and audio source information as described in the opening paragraph, allowing easy, rapid and precise editing.
According to the invention, this object is solved by a means for recognizing speech in the audio information and for generating a character sequence, particularly an ASCII character sequence, which corresponds to said speech as a function of time, for display on a display screen of the indicator means, and a means for deriving editing data with reference to marked parts of the character sequence displayed on the display screen of the indicator means.
The invention has the advantage that current (television) news broadcasts, which usually comprise mainly spoken texts, can now be edited more rapidly and more precisely with respect to time as compared with the prior art. The basis for editing is no longer the image information of the video source material or the information of the audio source material reproduced through a loudspeaker, but a text derived by speech recognition from the audio data sequence of the audio source material and displayed on the screen of a display device. The sequence of words in the text displayed is coupled, as a function of time, with the sequence of images, because the image and sound information of a news contribution is generally recorded and stored simultaneously. The storage location of the image and sound information on the recording medium is fixed by means of a time code. Thus, a given time code value of the time code is assigned to each word in the displayed text, which time code value can be used advantageously in the realization of the EDL.
In accordance with a further embodiment of the invention, marks defining limits of parts of the character sequence displayed on the display screen of the indicator means are provided so as to fix an edit start and/or edit end. Another embodiment of the invention comprises a means for fixing the position of the limits of the marked parts of the character sequence displayed on the display screen of the indicator means, and a means for converting the position of the limits of the marked parts to time code data assigned to the video and audio information and for taking over the time code data as editing data in the edit decision list.
The position of the marks in the displayed text corresponds to a given time code value. The mark-defined limits (edit start, edit end) of a selected take represent given time code values of the edit start and the edit end. A mouse is provided to shift the position of the marks on the display screen. The mouse-clicked mark position is converted to the time code data assigned to the video and audio material and entered in the EDL.
The composition of the EDL according to the invention does not require a trained cutter. When technical or artistic viewpoints do not play a role, as is usually the case when a current news broadcast is compiled, a journalist can now also directly realize the EDL for his news contribution and immediately check and possibly change the editing results before his contribution is broadcast.
A further embodiment of the invention is characterized in that the display screen of the indicator means is implemented as a screen which can be touched to fix the position limits of given parts in the displayed character sequence. By fingertip-touching the start of a sentence or the end of a sentence in the displayed text, the relevant time code data can be taken over in the EDL without having to use a mouse for shifting a mark and confirming the entry.
Television news topics presented by foreign agencies or broadcasting stations and transmitted via satellites to television home stations are often available in a language which is not commonly used in the home country. The news material received is to be processed in the home studio, i.e. the spoken texts in the received audio material are to be translated and must replace the original audio parts. Alternatively, before the received news material is broadcast, it can be commented upon in the national language by a studio speaker.
Such an accompanying sound added to the image material can also be subjected to a speech recognition so as to realize an edit decision list with the aid of the text obtained. This method has the advantage that an ASCII text file generated by means of speech recognition can also be used for controlling a teleprompter for the commentator. By changing the audio pitch, the required (lip) synchronicity may be restored at different lengths of the audio and video takes, with the defined editing marks serving as reference points.
The ASCII character sequence remaining after the editing operation may be further utilized for necessary documentation purposes.
A further embodiment of the invention is characterized in that the speech-recognition means comprises a device for recognizing different voices in the spoken audio information, which device assigns a given character font to each voice so as to distinguish it from other voices. Passages in the displayed text originating from different speakers can thus be assigned to a given

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

System for editing digital video and audio information does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with System for editing digital video and audio information, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System for editing digital video and audio information will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2593085

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.