Projects per year
While the primary bottleneck to a number of computational workflows was not so long ago limited by processing power, the rise of machine learning technologies has resulted in an interesting paradigm shift, which places increasing value on issues related to data curation - that is, data size, quality, bias, format, and coverage. Increasingly, data-related issues are equally as important as the algorithmic methods used to process and learn from the data. Here we introduce an open-source graphics processing unit-accelerated neural network (NN) framework for learning reactive potential energy surfaces (PESs). To obtain training data for this NN framework, we investigate the use of real-time interactive ab initio molecular dynamics in virtual reality (iMD-VR) as a new data curation strategy that enables human users to rapidly sample geometries along reaction pathways. Focusing on hydrogen abstraction reactions of CN radical with isopentane, we compare the performance of NNs trained using iMD-VR data versus NNs trained using a more traditional method, namely, molecular dynamics (MD) constrained to sample a predefined grid of points along the hydrogen abstraction reaction coordinate. Both the NN trained using iMD-VR data and the NN trained using the constrained MD data reproduce important qualitative features of the reactive PESs, such as a low and early barrier to abstraction. Quantitative analysis shows that NN learning is sensitive to the data set used for training. Our results show that user-sampled structures obtained with the quantum chemical iMD-VR machinery enable excellent sampling in the vicinity of the minimum energy path (MEP). As a result, the NN trained on the iMD-VR data does very well predicting energies that are close to the MEP but less well predicting energies for "off-path" structures. The NN trained on the constrained MD data does better predicting high-energy off-path structures, given that it included a number of such structures in its training set.