Executing trained image classification model for video


I've been working with vanilla feed forward neural networks and have been researching the convolutional neural network literature. Thus far I've have not encountered how often the model is executed in order to classify objects. For example if a camera is capturing video at a rate of 15 frames per second is the classification model being trained / executed iteratively in order to maintain non time delayed classifications ?


Posted 2018-02-25T10:45:11.593

Reputation: 259



It is possible to deploy models that process video ranging from frame-by-frame analysis up to live streaming, depending on the use case. The reason for deployment will influence video quality and pre-processing as well as the complexity of model that can be used.

As video can result in a lot of data piling up very rapidly, compromises might be made between quality of image resolution and colour, frames might be skipped, processing could be done offline, or it might just cost a ton of money to pay for the compute power! Without going into much detail, different algorithms can be used for the different tasks of detection, recognition/classification, and tracking, and the input to each can be processed in different ways.

This tutorial does a great job of showing how OpenCV can be used for object detection from a laptop webcam stream and a pre-trained CNN, it explicitly mentions dropping frames to improve performance.

There's also this post with a good overview of object detection, recognition and tracking. Depending on the use case in the context of each algorithm described, you can determine how important it is to retain and process all frames or not - for instance, see the section on background subtraction; you might not want to risk something flying through the frame without detecting it, let alone classifying it.


Posted 2018-02-25T10:45:11.593

Reputation: 291