Nowadays, depth estimation from a single image is a task that has been successfully addressed by Convolutional Neural Network (CNN) architectures. In this regard, several authors have taken advantage of depth datasets publicly available to the scientific community to train their CNN-based methods. From a project of Karlsruhe Institute of Technology and Toyota Technological Institute at Chicago has emerged KITTI (acronym derived from the institutions' names) as one of the most popular public datasets providing depth estimates associated to RGB (Red, Green, Blue) images. Regarding the depth data in KITTI and typically in many other datasets, these include monocular or stereo RGB images associated with depth images obtained via laser, stereo cameras or a combination of both. These images and depth data have been collected by driving around outdoor urban environments with cameras looking forward to the horizon. In contrast, in this work, we are interested in CNN-based depth estimation in a single aerial image for which depth datasets are not available. In addition, popular CNN architectures for depth estimation in a single-image struggle to estimate depth in aerial scenes due to the fact that the camera angle and object appearance in aerial imagery are significantly different. Nevertheless, we propose to harvest the depth information available in KITTI in order to tackle the problem of depth estimation in a single aerial image. To this end, our approach is a two-step methodology based on patch processing that is later used as input for a set of proposed CNN architectures. Our results indicate that this approach is promising, and those datasets such as KITTI may indeed be exploited in other domains, especially where the data acquisition may be expensive or difficult to be carried out such as for aerial scenes.