In this paper, we present a novel approach for clip-based key frame extraction. Our framework allows both clips with subtle changes as well as clips containing rapid shot changes, fades and dissolves to be well approximated. We show that creating key frame video abstractions can be achieved by transforming each frame of a video sequence into an eigenspace and then clustering this space using Gaussian Mixture Models (GMMs). An iterative process computes a GMM configuration that best clusters the data based on a maximum likelihood threshold. The image nearest to the centres of each of the GMM components are selected as key frames. Unlike previous work this technique relies on global video clip properties and results show that the key frames extracted give a very good representation of the overall clip content. We show that, by using a single threshold, an operator can easily control the number of representative key frames generated. We also demonstrate that clustering in eigen-time space improves the video abstractions in a quantifiable manner and we demonstrate the application of this technique on a database of $307$ clips of wildlife footage containing dissolves, shot changes, fades, pans, zooms and a wide range of animal behaviours.
|Translated title of the contribution||Visual Abstraction of Wildlife Footage using Gaussian Mixture Models|
|Pages (from-to)||25 - 30|
|Number of pages||5|
|Journal||The 15th International Conference on Vision Interface|
|Publication status||Published - May 2002|
Bibliographical noteEditors: D. D. Gorodnichy and H. Zhang
Publisher: Canadian Image Processing and Pattern Recognition Society