How can I measure an object using Computer Vision and neural network techniques?


I am studying about Computer Vision around 2-3 months long. Recently, I have an idea to conduct a research about computer vision and Artificial neural networks. The idea is I want to make an automatic object measurement using ANN. The purpose is to measure the distance between two opposite sides of the objects.

The main thing is this research is going to work as the fractional caliper tools works. Fractional caliper is a tool that use to measure the object. However, it will be time consuming to measure huge amounts of objects if there are a lot of small objects in the industry. Through this tool, I found an idea in how to make a system that can automatically give an accurate results similar as the fractional calipers.

For example, giving an image of the object and the system can define the depth, weight or high, etc., Thus, the idea has been explained above, please suggest me what kinds of Computer Vision model used to apply for this study? And how to combine it with ANN technique?

Carme Afonso

Posted 2019-11-04T18:46:30.153

Reputation: 31



Father Ted explains why this is a hard problem.

Seriously -- if you have stereo images it should be possible, since that's what we use for depth perception. If you know how far away points x1 and x2 are, then you can measure distance using trigonometry. No neural networks needed, I guess.


Posted 2019-11-04T18:46:30.153

Reputation: 139

Sigh.. you live by the meme, you die by the meme. Thanks! – jmmcd – 2019-11-05T18:23:04.847

Thank You so much.. Currently, I am trying to look for it.. I really appreciate your idea.. – Carme Afonso – 2019-11-07T08:17:03.473


If the measurements you want from the object aren't too complicated (ie. length of a clearly defined feature), and if you are able to acquire a training dataset of images of the objects similar to what your model will see in your use case (same scale/distance), their bounding boxes and their measurements, a model you could try to implement is a Multi-Task Convolutional Neural Network (MTCNN).

MTCNNs are typically used for face detection and alignment, but I would imagine it is possible to adapt them to your use case given proper training and tuning. If there are more complicated measurements that you want to obtain, you could pass on the detected objects to another model to make more specific measurements.

You will have a problem however, with measuring depth. Depth is hard to estimate from an image because of the information that we lose when moving from a 3D to a 2D space. A longer explanation on this is available in MachineEpsilon's answer to the Cross Validated question "how to detect the exact size of an object in an image using machine learning?" but quoting his main statements:

This task of depth estimation is part of a hard and fundamental problem in computer vision called 3D reconstruction. Recovering metric information from images is sometimes called photogrammetry. It's hard because when you move from the real world to an image you lose information.

Specifically, the projective transformation that takes your 3D point to your 2D point via = does not preserve distance. Since is a 2×3 matrix, calculating −1 to solve −1= is an underdetermined inverse problem. A consequence of this is that pixel lengths are not generally going to be meaningful in terms of real world distances

However, that's not to say you could add additional sensors to resolve the depth estimation problem (ie. stereoscopic cameras or infrared distance sensors) if additional cost is not an issue.


Posted 2019-11-04T18:46:30.153

Reputation: 121

Thank You so much.. I am going to therefore investigate on this technique.. – Carme Afonso – 2019-11-07T08:15:45.933