Anisotropic Convolutional Neural Networks for RGB-D based Semantic Scene Completion

Jie Li, Peng Wang, Kai Han, Yu Liu

    Research output: Contribution to journalArticle (Academic Journal)peer-review

    9 Citations (Scopus)

    Abstract

    Semantic scene completion (SSC) is a computer vision task aiming to simultaneously infer the occupancy and semantic labels for each voxel in a scene from partial information consisting of a depth image and/or a RGB image. As a voxel-wise labeling task, the key for SSC is how to effectively model the visual and geometrical variations to complete the scene. To this end, we propose the Anisotropic Network (AIC-Net), with novel convolutional modules that can model varying anisotropic receptive fields voxel-wisely in a computationally efficient manner. The basic idea to achieve such anisotropy is to decompose 3D convolution into three consecutive dimensional convolutions, and determine the dimension-wise kernels on the fly. One module, termed kernel-selection anisotropic (KSA) convolution, adaptively selects the optimal kernel sizes for each dimensional convolution from a set of candidate kernels, and the other module, termed kernel-modulation anisotropic (KMA) convolution, directly modulates a single convolutional kernel for each dimension to derive more flexible receptive field. By stacking multiple such anisotropic modules, the 3D context modeling capability and flexibility can be further enhanced. Moreover, we present a new end-to-end trainable framework to approach the SSC task avoiding the expensive TSDF pre-processing as in many existing methods. Extensive experiments on SSC benchmarks show the advantage of the proposed methods.
    Original languageEnglish
    Pages (from-to)8125 - 8138
    JournalIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)
    Volume44
    Issue number11
    Early online date18 May 2021
    DOIs
    Publication statusPublished - 1 Nov 2022

    Bibliographical note

    Publisher Copyright:
    IEEE

    Fingerprint

    Dive into the research topics of 'Anisotropic Convolutional Neural Networks for RGB-D based Semantic Scene Completion'. Together they form a unique fingerprint.

    Cite this