A framework devised for the storage of metadata describing 3DTV content, derived from the application of several 3DTV media analysis tools such as shot/scene boundary detection, person detection/tracking/recognition, facial expression recognition, music/speech segmentation, speaker diarization and music genre/mood characterization, in an MPEG 7/AVDP compatible manner will be presented in this contribution. AVDP was designed by having mainly single channel videos in mind. Thus, in order to utilize it for the description of stereoscopic video and multichannel audio content, a number of implementation decisions, that cater to the particularities of such content (storage of stereoscopic quality information, relations between entities in the various channels etc) had to be taken and will be presented in this contribution. Examples of using AVDP to describe the results of analysis algorithms on stereo video and multichannel audio content will be presented. Additionally, several Classification Schemes used in the proposed framework will be discussed, since some terms may be useful in other applications. Finally, the contribution will include a discussion on possible extensions/modifications of the MPEG-7 standard or the AVDP profile to better cover the needs of stereoscopic and mutiview content description. The proposed framework was devised within 3DTVS (3DTV Content Search), a European FP7 project that aims at devising 3DTV audiovisual content analysis description, indexing, search and browsing methods and incorporating such functionalities in 3D audio-visual content archives.
|Number of pages||1|
|Publication status||Published - 3 Jun 2014|
|Event||EBU MDN Workshop 2014 - Geneva, Switzerland|
Duration: 9 Jun 2014 → 11 Jun 2014
|Conference||EBU MDN Workshop 2014|
|Period||9/06/14 → 11/06/14|