TY - GEN
T1 - Centroids triplet network and temporally-consistent embeddings for in-situ object recognition
AU - Lagunes Fortiz, Miguel A
AU - Damen, Dima
AU - Mayol-Cuevas, Walterio W
PY - 2021/2/10
Y1 - 2021/2/10
N2 - This work proposes learning to recognize objects from a small number of training examples collected and deployed in-situ. That is, from data collected where the objects are commonly placed or being used, perhaps after first encountering them, the learning algorithm immediately is able to recognize them again. We refer to this method-ology as in-situ learning, and it opposes to the conventional methodology of using complex data acquisition mechanisms, such as rotating tables or synthetic data, to build a large-scale dataset for training convolutional neural networks (ConvNets). To learn in-situ, we propose a novel loss function that generates discriminative features for known and unseen objects, by utilizing a regularization term that reduces the distance between features and their manifold centroid. Additionally, we propose a temporal filter that is particularly useful to quickly react to appearing objects on the scene, which depending on the distance between neighboring video-frame features, it applies a weighted average between the current and the previous frame. Our framework achieves state-of-the-art accuracy for in-situ and on-the-fly learning, for the case of known objects achieves an average increase in accuracy of 3.01%, an increase of 3.3% for novel objects, and an average increase of 7.07% for the combined case, compared with the closest baseline. Utilizing the temporal filtering, led to a further increase in accuracy against nuisances of 7.32% for the known and novels objects case.
AB - This work proposes learning to recognize objects from a small number of training examples collected and deployed in-situ. That is, from data collected where the objects are commonly placed or being used, perhaps after first encountering them, the learning algorithm immediately is able to recognize them again. We refer to this method-ology as in-situ learning, and it opposes to the conventional methodology of using complex data acquisition mechanisms, such as rotating tables or synthetic data, to build a large-scale dataset for training convolutional neural networks (ConvNets). To learn in-situ, we propose a novel loss function that generates discriminative features for known and unseen objects, by utilizing a regularization term that reduces the distance between features and their manifold centroid. Additionally, we propose a temporal filter that is particularly useful to quickly react to appearing objects on the scene, which depending on the distance between neighboring video-frame features, it applies a weighted average between the current and the previous frame. Our framework achieves state-of-the-art accuracy for in-situ and on-the-fly learning, for the case of known objects achieves an average increase in accuracy of 3.01%, an increase of 3.3% for novel objects, and an average increase of 7.07% for the combined case, compared with the closest baseline. Utilizing the temporal filtering, led to a further increase in accuracy against nuisances of 7.32% for the known and novels objects case.
KW - training
KW - manifolds
KW - filtering
KW - data aquisition
KW - object recognition
KW - convolutional neural networks
KW - intellingent robots
U2 - 10.1109/IROS45743.2020.9341050
DO - 10.1109/IROS45743.2020.9341050
M3 - Conference Contribution (Conference Proceeding)
SN - 978-1-7281-6213-3
T3 - IEEE International Conference on Intelligent Robots and Systems
SP - 10796
EP - 10802
BT - 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2020
PB - Institute of Electrical and Electronics Engineers (IEEE)
T2 - conference is IROS 2020
Y2 - 25 October 2020 through 29 October 2020
ER -