Get the position of an object, out of an image



I have some images with a fixed background and a single object on them which is placed, in each image, at a different position on that background. I want to find a way to extract, in an unsupervised way, the positions of that object. For example, us, as humans, would record the x and y location of the object. Of course the NN doesn't have a notion of x and y, but i would like, given an image, the NN to produce 2 numbers, that preserve as much as possible from the actual relative position of objects on the background. For example, if 3 objects are equally spaced on a straight line (in 3 of the images), I would like the 2 numbers produced by the NN for each of the 3 images to preserve this ordering, even if they won't form a straight line. They can form a weird curve, but as long as the order is correct that can be topologically transformed to the right, straight line. Can someone suggest me any paper/architecture that did something similar? Thank you!

Silviu-Marian Udrescu

Posted 2019-09-17T04:13:13.033

Reputation: 41

1By fixed, you mean the background is always the same image ? – Astariul – 2019-09-17T04:35:33.710

@Astariul yes! and the object that changes position is also the same in each image (same size, shape, orientation etc.). – Silviu-Marian Udrescu – 2019-09-17T05:45:24.477

@Silviu-MarianUdrescu you mightn't need machine learning for this, it sound's like the object is very defined. If you can code something up that works 100% of the time why not do that? – Recessive – 2019-09-17T06:09:40.263

@Recessive Ideally I want a NN that is able to learn the 2 numbers representation of an image, for any set of background and object moving. Coding it by hand works for only a fix background and a fix image (which is indeed what my post is about), but I want a NN approach so I can later generalize i.e. if I pass a new set of images (all with the same background and object, but different from the ones in the previous set of images) the NN would identify the "x" and "y" just as easily without any coding modifications. Coding something up manually would require a new code for each set of images. – Silviu-Marian Udrescu – 2019-09-17T06:15:04.883

@Silviu-MarianUdrescu There should be a few ways of doing this. You could create a regression CNN that outputs (x,y) coordinates (i wouldn't recommend this, it's very hard to get this to work from my experience). You could use deconvolutional layers to produce an output image of similar dimensions to the input image, using a softmax on the entire image to produce a probability of the image being at each location (you could also produce a downscaled version if approximate location is ok). Other then that I think there's so great resources online for object detection, just google search. – Recessive – 2019-09-17T06:29:11.517

I am not sure how can I use a regression CNN at all in an unsupervised way in this case. I am not sure I understand what you mean by using deconvolutions and softmax. How can I use that to extract the x and y of the object? – Silviu-Marian Udrescu – 2019-09-17T06:33:38.250

@Silviu-MarianUdrescu Ahh sorry, I didn't see unsupervised. Ok well that changes things slightly. If you really want to use unsupervised learning, I'm sorry I can't help as I have no experience with that. However, this problem sounds like it's possible to complete using supervised learning. As I suggested earlier, you could perhaps hard code a program to extract the (x,y) point of the objects over a few hundred/thousand images, and using that you could then train a neural network. This should then improve robustness to future input changes. – Recessive – 2019-09-17T08:05:19.677

@Silviu-MarianUdrescu As for deconvolutions perhaps ignore that, you might be able to get away with the following: Take a standard CNN, and just have the output be, say 49 nodes for a 7x7 image. This means that if node 4 has a value of 0.9, point (4,1) is 90% likely to contain the object. As for softmax, search it up online, it essentially converts the inputs to a probability distribution across all the output nodes. – Recessive – 2019-09-17T08:07:58.857

No answers