Video Segmentation and Indexing using Motion Estimation

Porter Sarah

Research output: Other contributionPhD thesis (not Bristol)

Abstract

Video indexing is a central component necessary to facilitate efficient content-based retrieval and browsing of visual information stored in large multimedia databases. This thesis presents work towards a unified framework for automated video indexing. To create an efficient index, a set of representative key frames are selected which capture and encapsulate the entire video content. This is achieved by, firstly, segmenting the video into its constituent shots and, secondly, selecting an optimal number of frames between the identified shot boundaries. The segmentation algorithm is designed to detect both abrupt shot transitions, or \emph{cuts}, and gradual transitions, such as \emph{dissolves} and \emph{fades}. This is achieved by means of a two-component frame differencing metric taking both image structure and colour distributions into account. The application of hierarchical block-based normalised correlation and local colour histogram differences leads to a method which is both accurate and robust. After the segmentation step, the key frames are selected to minimise representational redundancy whilst still portraying the content in each shot. This is achieved by employing a graph-based representation of each shot where nodes represent frames and connection weights the amount of shared content between the frames corresponding to the connected nodes. The key frames are then selected as those corresponding to nodes present on the least weight path through the graph. As a final step, the camera motion is characterised to provide an additional layer of video annotation which may prove useful for indexing.
Translated title of the contributionVideo Segmentation and Indexing using Motion Estimation
Original languageEnglish
Publication statusPublished - 2004

Bibliographical note

Other identifier: 2000202

Fingerprint

Dive into the research topics of 'Video Segmentation and Indexing using Motion Estimation'. Together they form a unique fingerprint.

Cite this