Optimizing Queries over Video via Lightweight Keypoint-based Object Detection

Jiansheng Dong, Jingling Yuan, Lin Li, Xian Zhong, Weiru Liu

Research output: Chapter in Book/Report/Conference proceedingConference Contribution (Conference Proceeding)

4 Citations (Scopus)
152 Downloads (Pure)


Recent advancements in convolutional neural networks based object detection have enabled analyzing the mounting video data with high accuracy. However, inference speed is a major drawback of these video analysis system because of the
heavy object detectors. To address the computational and practicability challenges of video analysis, we propose FastQ, a system for efficient querying over video at scale. Given a target video, FastQ can automatically label the category and number of objects for each frame. We introduce a novel lightweight object detector named FDet to improve the efficiency of query system. First, a difference detector filters the frames whose difference is less than the threshold. Second, FDet is employed to efficiently label the remaining frames. To reduce inference time, FDet detects a center keypoint and a pair of corners from the feature map generated by a lightweight backbone to predict the bounding boxes. FDet completely avoid the complicated computation related to anchor boxes. Compared with state-of-the-art real-time detectors, FDet achieves superior performance with 29.1% AP on COCO benchmark at 25.3ms. Experiments show that FastQ achieves 150× to 300× speed-ups while maintaining more than 90% accuracy in video queries.
Original languageEnglish
Title of host publicationACM International Conference on Multimedia Retrieval (ICMR)'2020
PublisherAssociation for Computing Machinery (ACM)
Number of pages7
Publication statusPublished - 8 Jun 2020
EventACM International Conference on Multimedia Retrieval (ICMR) , 2020 - Dublin, Dublin, Ireland
Duration: 8 Jun 202011 Jun 2020


ConferenceACM International Conference on Multimedia Retrieval (ICMR) , 2020
Internet address


  • object detection
  • real-time
  • anchor-free
  • keypoint-based
  • high-resolution representation


Dive into the research topics of 'Optimizing Queries over Video via Lightweight Keypoint-based Object Detection'. Together they form a unique fingerprint.

Cite this