Low-complexity object detection with deep convolutional neural network for embedded systems

Subarna Tripathi; Byeongkeun Kang; Gokce Dane; Truong Nguyen

doi:10.1117/12.2275512

19 September 2017 Low-complexity object detection with deep convolutional neural network for embedded systems

Subarna Tripathi, Byeongkeun Kang, Gokce Dane, Truong Nguyen

Proceedings Volume 10396, Applications of Digital Image Processing XL; 103961M (2017) https://doi.org/10.1117/12.2275512
Event: SPIE Optical Engineering + Applications, 2017, San Diego, California, United States

Abstract

We investigate low-complexity convolutional neural networks (CNNs) for object detection for embedded vision applications. It is well-known that consolidation of an embedded system for CNN-based object detection is more challenging due to computation and memory requirement comparing with problems like image classification. To achieve these requirements, we design and develop an end-to-end TensorFlow (TF)-based fully-convolutional deep neural network for generic object detection task inspired by one of the fastest framework, YOLO.¹ The proposed network predicts the localization of every object by regressing the coordinates of the corresponding bounding box as in YOLO. Hence, the network is able to detect any objects without any limitations in the size of the objects. However, unlike YOLO, all the layers in the proposed network is fully-convolutional. Thus, it is able to take input images of any size. We pick face detection as an use case. We evaluate the proposed model for face detection on FDDB dataset and Widerface dataset. As another use case of generic object detection, we evaluate its performance on PASCAL VOC dataset. The experimental results demonstrate that the proposed network can predict object instances of different sizes and poses in a single frame. Moreover, the results show that the proposed method achieves comparative accuracy comparing with the state-of-the-art CNN-based object detection methods while reducing the model size by 3× and memory-BW by 3 − 4× comparing with one of the best real-time CNN-based object detectors, YOLO. Our 8-bit fixed-point TF-model provides additional 4× memory reduction while keeping the accuracy nearly as good as the floating-point model. Moreover, the fixed- point model is capable of achieving 20× faster inference speed comparing with the floating-point model. Thus, the proposed method is promising for embedded implementations.

Conference Presentation

Citation Download Citation

Subarna Tripathi, Byeongkeun Kang, Gokce Dane, and Truong Nguyen "Low-complexity object detection with deep convolutional neural network for embedded systems", Proc. SPIE 10396, Applications of Digital Image Processing XL, 103961M (19 September 2017); https://doi.org/10.1117/12.2275512

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available