This paper introduces a novel system that is able to fuse two or more sets of multimodal videos in the compressed domain. This is achieved without drift and produces an embedded bitstream that offers fine grain scalability. Previous attempts to fuse in the compressed video domain have been not been possible due to the complications of predictive loops within standard video encoding techniques. The compression system is based on an optimised 3D wavelet spatio-temporal codec using the 3D Discrete Dual-tree Wavelet Transform (DDWT) together with bit plane encoding method SPIHT and a coefficient sparsification process (noise shaping). Together, these methods are able to efficiently encode a video sequence without any motion compensation due to the directional (in space and time) selectivity of the transform. This enables the fusion of video information within the compressed domain without drift caused by prediction loops. In using the SPIHT bit plane encoder the bitstreams are embedded and therefore inherently scalable. The entire system is therefore able to offer fusion within the compressed video domain and bitstream scalability. This results in extremely flexible fusion scenarios within heterogeneous bandwidth environments and variable client receiving capabilities.
|Translated title of the contribution||Scalable Fusion Using a 3D Dual Tree Wavelet Transform|
|Title of host publication||Sensor Signal Processing for Defence 2011 (SSPD 2011)|
|Number of pages||5|
|Publication status||Published - 2011|