Recent advances in deep learning, in particular in convolutional neural networks (CNNs), have been widely used in robotics for object classification and action recognition, among others, with very high performance. Nevertheless, this high performance, mostly in classification tasks, is rarely accompanied by reasoning processes that consider the relationships between objects, actions, and effects. In this article, we used three CNNs to classify objects, actions, and effects that were trained with the CERTH-SOR3D dataset that has more than 20,000 RGB-D videos. This dataset involves 14 objects, 13 actions, and in this article was augmented with seven effects. The probabilistic vector output of each trained CNN was combined into a Bayesian network (BN) to capture the relationships between objects, actions, and effects. It is shown that by probabilistically combining information from the three classifiers, it is possible to improve the classification performance of each CNN or to level the same performance with less training data. In particular, the recognition performance improved from 71.2% to 79.7% for actions, 85.0%–86.7% for objects, and 77.0%–82.1% for effects. In the article, it is also shown that with missing information, the model can still produce reasonable classification performance. In particular, the system can be used for reasoning purposes in robotics, as it can make action planning with information from object and effects or it can predict effects with information from objects and actions.
Bibliographical noteThe acceptance date for this record is provisional and based upon the month of publication for the article.
- action recognition
- Affordances of objects
- Bayesian network
- convolutional neural networks
- object recognition