5

I understand that there are flavors of (convolutional) neural networks that are useful for object localization and detection tasks of reasonable difficulty. In all of the examples I have seen so far, localization is formulated as finding the corners of a bounding box. Often, the fit is not expected to be very precise:

Conversely, I am interested in a task I want to achieve very precise localization and characterization of some simple shapes or objects. As an example of one of the simplest cases I can think of, my inputs will be images like the following:

Given this 60x60 image, I want my neural net(s), via **regression**, to tell me that the circle's **diameter is 18px** and its **centre is located at (28, 21)** from top left. (I will train it using similar 60x60 images with white circles of various sizes on black backgrounds.)

Later I am interested in dealing with similar tasks in the real world, e.g. spheres/cubes/cylinders with different viewing angles, light conditions, occlusions, etc. However, I am interested in solving this simplest case first. (One reason is that I can generate this data very easily.)

I have the following specific questions:

- Has anyone used neural nets for this sort of tasks before? (e.g. precisely determining sizes and centroids of objects)
- My understanding is that these things are at least theoretically possible using convolutional nets, or even sufficiently complicated vanilla fully connected nets. Is this correct?
- What architecture(s) would be appropriate for these tasks?

Note: I am aware that fitting a bounding box to the circles and calculating its centre and size will solve this particular case, but it will not generalize to handle occlusions, changing lights, etc. I would like to move towards a method which can, for example, calculate the centroids and diameters of spheres in real-world B&W photos.