5

My question relates to but doesn't duplicate a question that has been asked here.

I've Googled a lot for an answer to the question: *Can you find the dimensions of an object in a photo if you don't know the distance between the lens and the object, and there are no "scales" in the image?*

The overwhelming answer to this has been "no". This is, from my understanding, due to the fact that, in order to solve this problem with this equation,

$$Distance\ to\ object(mm) = \frac{f(mm) * real\ height(mm) * image\ height(pixels)}{object\ height(pixels) * sensor\ height(mm)} $$

you will need to know either the "real height" or the "distance to object". It's the age old issue of "two unknowns, one equation". That's unsolvable. A way around this is to place an object in the photo with a known dimension in the same plane as the unknown object, find the distance to this object and use that distance to calculate the size of the unknown (this relates to answer from the question I linked above). This is an equivalent of putting a ruler in the photo and it's a fine way to solve this problem easily.

This is where my question remains unanswered. What if there is no ruler? What if you want to find a way to solve the unsolvable problem? **Can we train an Artificial Neural Network to approximate the value of the real height without the value of the object distance or use of a scale?** Is there a way to leverage the unexpected solutions we can get from AI to solve a problem that is seemingly unsolvable?

Here is an example to solidify the nature of my question:

I would like to make an application where someone can pull out their phone, take a photo of a hail stone against the ground at a distance of ~1-3 ft, and have the application give them the hail stone dimensions. My project leader wants to make the application accessible, which means he doesn't want to force users to carry around a quarter or a special object of known dimensions to use as a scale.

In order to avoid the use of a scale, would it be possible to use all of the EXIF meta-data from these photos to train a neural network to approximate the size of the hail stone within a reasonable error tolerance? For some reason, I have it in my head that if there are enough relevant variables, we can design an ANN that can pick out some pattern to this problem that we humans are just unable to identify. Does anyone know if this is possible? If so, is there a deep learning model that can best suit this problem? If not, please put me out of my misery and tell me why it it's impossible.

5Interesting question! My first instinct is "no" if we're talking about a solution that would be robust against "adversarial" inputs (e.g. if we're taking pictures of a cube, there's probably an infinite number of different combinations of cube size + distance to camera that would all look identical, so be impossible to reliably distinguish just from a 2D image). My instinct is "yes / kind of" if we're just talking about a solution that would work decently well for "natural" / "real-world" pictures, since objects will tend to have certain typical sizes in "natural" pictures. – Dennis Soemers – 2018-08-31T19:57:02.733

2Those are just my instincts though, not sure enough about them to put them in an answer. Food for thought for anyone who does want to address the question with a full answer though! – Dennis Soemers – 2018-08-31T19:57:30.463

Welcome to SE:AI!

(I took the liberty of converting your formula to MathJax for convenience of potential answerers--feel free to tweak, or roll back the edit if I got anything wrong.)– DukeZhou – 2018-08-31T20:58:34.5832I don't know whether there is research on this or not, but a natural approach would be a sort of transfer learning: Train the model on the sizes of known objects, and then show it pictures containing both known and unknown objects, interacting. I think @DennisSoemers is right that this won't work for adversarial inputs, but then again, neither do our own eyes! – John Doucette – 2018-09-01T11:58:41.953

@JohnDoucette What about known objects of variable size? Like a hail stone? Would this still be considered adversarial input? I was hoping there might be some combination of inputs like focal length or image depth or ISO that the learning network might be able to pick up on and accurately predict hail size with only knowing a range of distances... – dingFAching – 2018-09-01T20:04:36.967

Our eyes can estimate the size of an object because they're two !! Actually, you can calculate the distance of an object using two pictures with a different point of view. If you have the distance, you can then obtain the size of the object. But you have some hard constrains about the pictures. More details here : http://dsc.ijs.si/files/papers/S101%20Mrovlje.pdf

– Jérémy Blain – 2018-09-04T12:57:02.627