Ideas on a network that can translate image differences into motor commands?


I'd like to design a network that gets two images (an image under construction, and an ideal image), and has to come up with an action vector for a simple motor command which would augment the image under construction to resemble the ideal image more. So basically, it translates image differences into motor commands to make them more similar?

I'm dealing with a 3D virtual environment, so the images are snapshots of objects and motor commands are simple alterations to the 3D shape.

Probably the network needs two pre-trained CNNs with the same weights that extract image features, then output and concatenate those into a dense layer (or two), which converges into action-space. Training should probably happen via reinforcement learning

Additionally, in the end it needs recurrence, since there are multiple motor actions it needs to do in a row to get closer to the intended result.

Would there be any serious difficulties with this? Or are there any approaches to achieve the intended result? or any similar examples?

Thanks in advance


Posted 2019-11-11T10:25:31.540

Reputation: 235

Is there an obvious/detectable endpoint for a sequence of actions (e.g getting within some limit of ideal), or an enforced one (such as max number of steps) - or both, or is this issue open in your design? – Neil Slater – 2019-11-11T10:33:20.980

This is still an open issue in my design. Probably starting more basic, with an enforced one of a few steps to see how it performs – SumakuTension – 2019-11-11T10:38:53.723

Gotta say, Great idea. – DuttaA – 2019-11-11T11:46:20.947

No answers