Pulse or digital communications – Bandwidth reduction or expansion – Television or motion video signal
Reexamination Certificate
2001-04-20
2004-11-02
Philippe, Gims (Department: 2623)
Pulse or digital communications
Bandwidth reduction or expansion
Television or motion video signal
C382S243000, C348S399100
Reexamination Certificate
active
06813313
ABSTRACT:
FIELD OF THE INVENTION
The invention relates generally to the field of video analysis, and more particularly to analyzing domain specific videos.
BACKGROUND OF THE INVENTION
As digital video becomes more pervasive, efficient ways of analyzing the content of videos become necessary and important. Videos contain a huge amount of data and complexity that make the analysis very difficult. The first and most important analysis is to understand the structure of the video, which can provide the basis for further detailed analysis.
A number of analysis methods are known, see M. M. Yeung, B. L. Yeo, W. Wolf, and B. Liu, “Video Browsing using Clustering and Scene Transitions on Compressed Sequences,” Multimedia Computing and Networking 1995, Vol. SPIE 2417, pp. 399-413, February 1995, M. J. Yeung and B. L. Yeo, “Time-constrained Clustering for Segmentation of Video into Story Units, ICPR, Vol. C. pp. 375-380 August 1996, D. Zhong, H. J. Zhang and S. F. Chang, “Clustering Methods for Video Browsing and Annotation,” SPIE Conference on Storage and Retrieval for Image and Video Databases, Vol. 2670, February 1996, J. Y. Chen, C. Taskiran, E. J. Delp and C. A. Bouman, “ViBE: A New Paradigm for Video Database Browsing and Search. In Proc. IEEE Workshop on Content-Based Access of Image and Video Databases, 1998, and Gong, Sin, Chuan, Zhang and Sakauchi, “Automatic Parsing of TV Soccer Programs,” Proceedings of the International Conference on Multimedia Computing and systems (ICMCS), May 1995.
Gong et al. describe a system that uses domain knowledge and domain-specific models in parsing the structure of a soccer video. Like other prior art systems, a video is segmented first into shots. Video features extracted from frames within each shot are used to classify each shot into different categories, e.g., penalty area, midfield, corner area, corner kick, and shot at goal. Note that work relies heavily on accurate segmentation of video into shots before features are extracted.
Zhong et al. also describe a system for analyzing sport videos. The system provides detects boundaries of high-level semantic units, e.g., pitching in baseball and serving in tennis. Each semantic unit is further analyzed to extract interesting events, e.g., number of strokes, type of plays—returns into the net or baseline returns in tennis. A color-based adaptive filtering method is applied to a key frame of each shot to detect specific views. Complex features, such as edges and moving objects, are used to verify and refine the detection results. Note that this work also relies heavily on accurate segmentation of the video into shots prior to feature extraction. In short, both Gong and Zhong consider the video to be a concatenation of basic units, where each unit is a shot. The resolution of the feature analysis does not go finer than the shot level.
Thus, generally the prior art is as follows: first the video is segmented into shots. Then, key frames are extracted from each shot, and grouped into scenes. The scene transition graph and hierarchy tree are used to represent these data structures. The problem with those approaches is the mismatch between the low-level shot information, and the high-level scene information. They can only work when interesting content changes correspond to the shot changes. In many applications such as soccer videos, interesting events such as “plays” cannot be defined by shot changes. Each play may contain multiple shots that have similar color distributions. Transitions between plays are hard to find by simple clustering of shot features.
In many situations, when the camera has a lot of motion, shot detection processes tend to have many false alarms because this type of segmentation is from low-level features without considering the domain-specific syntax and content model of the video. Thus, it is difficult to bridge the gap between low-level features and high-level features based on shot-level segmentation. Moreover, too much information is lost during the shot segmentation process.
Videos in different domains have very different characteristics and structures. Domain knowledge can greatly facilitate the analysis process. For example, in sports videos, there are usually a fixed number of cameras, views, camera control rules, and transition syntax imposed by the rules of the game, e.g., play-by-play in soccer, serve-by-serve in tennis, and inning-by-inning in baseball.
Y. P. Tan, D. D. Saur, S. R. Kulkami and P. J. Ramadge in “Rapid estimation of camera motion from compressed video with application to video annotation,” IEEE Trans. on Circuits and Systems for Video Technology, 1999, and H. J. Zhang, S. Y. Tan, S. W. Smoliar and Y. H. Gong, in “Automatic Parsing and Indexing of News Video,” Multimedia Systems, Vol. 2, pp. 256-266, 1995, describe video analysis for news and baseball. But very few systems consider high-level structure in more complex videos such as a soccer video.
The problem is that a soccer game has a relatively loose structure compared to other videos like news and baseball. Except the play-by-play structure, the content flow can be quite unpredictable and happen randomly. There are a lot of motion and view changes in soccer.
Therefore, there is a need for a framework where all the information of low-level features of a video are retained, and the feature sequences better represented. Then, it can become possible to incorporate a domain-specific syntax and a content model, and high level structure to enable event detection, and statistical analysis.
SUMMARY OF THE INVENTION
The invention provides a general framework for video structure discovery and content analysis. In the method and system according to the invention, frame-based low-level features are extracted from a video. Each frame is represented by the values of features or labels converted from the features to convert the video to multiple label sequences or real number sequences. Each of such sequences is associated with one of the extracted low-level feature. The feature sequences are analyzed together to extract high-level semantic features.
The invention can be applied to videos of sport activities, such as soccer games to index and summarize the video. The invention uses a distinctive feature to capture the high-level structure of the soccer video, e.g., activity boundaries, and use a unique feature, e.g., grass orientation, together with camera motion to detect interesting events such as game strategy. The unique aspects of the system include compressed-domain feature extraction for real-time performance, use of domain specific features for detecting high-level events, and integration of multiple features for content understanding.
Particularly, the system and method according to the invention analyzes a compressed video including a sequence of frames. The amount of a dominant feature in each frame of the compressed video is measured. A label is associated with each frame according the measured amount of the dominant feature. Views in the video are identified according to the labels, and the video is segmented into actions according to the views. The video can then be analyzed according to the action to determine significant events in the video.
The dominant feature, labels, views, action, and significant events are stored in a domain knowledge database. In one embodiment, the dominant feature is color, and a color histogram is constructed to identify the dominant feature.
REFERENCES:
patent: 5802361 (1998-09-01), Wang et al.
patent: 6079566 (2000-06-01), Eleftheriadis et al.
patent: 6490370 (2002-12-01), Krasinski et al.
patent: 6516090 (2003-02-01), Lennon et al.
Gong et al., “Automatic Parsing of TV Soccer Programs”; Proceedings of the International Conference on Multimedia Computing and Systems (ICMCS), pp. 167-174, May 1995.
Zhong et al., “Structure Analysis of Sports Video Using Domain Models”; submitted to IEEE Conference on Multimedia and Exhibition, Japan, Jan. 2001.
Chang Shih-Fu
Divakaran Ajay
Xu Peng
Brinkman Dirk
Curtin Andrew J.
Mitsubishi Electric Research Laboratories Inc.
Philippe Gims
LandOfFree
Method and system for high-level structure analysis and... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and system for high-level structure analysis and..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and system for high-level structure analysis and... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3286198