What is the best approach for writing a program to identify objects in a picture then crop them a specific way?


My works quality control department is responsible for taking pictures of our products at various phases through our QC process and currently the process goes:

  1. Take picture of product
  2. Crop the picture down to only the product
  3. Name the cropped picture to whatever the part is and some other relevant data

Depending on the type of product the pictures will be cropped a certain way. So my initial thought would be to use a reference to an object identifier and then once the object is identified it will use a cropping method specific to that product. There will also be QR codes within the pictures being taken for naming via OCR in the future so I can probably identify the parts that way if this proves slow or problematic.

The part I am unsure about is how to get the program to know how to crop based on a part. For example I would like to present the program with a couple before crop and after crop photos of product X then make a specific cropping formula for product X based on those two inputs.

Also if it makes any difference my code is in C#

Rider Harrison

Posted 2018-08-07T23:33:46.087

Reputation: 51

1@pasabaporaqui I don't think the question is unclear at all. The title of the question doesn't just "ask for object identifaction", it already specifically asks for the sequence "identify objects, then crop them". That's exactly what's asked in the main body of the question as well. – Dennis Soemers – 2018-08-09T15:43:10.813

2@pasabaporaqui The question is 100% clear, we can clearly understand what the person wants; he wants to 1) identify objects because afterwards he wants to 2) crop the images, where different cropping methods should be used for different object types. There is absolutely no unclarity there. You might disagree that that approach is correct, you might argue that it might be better to try directly classifying "cropping methods" rather than doing so indirectly through a classification of "object types". That doesn't mean the question is unclear. It is advise which should go into a comment or answer – Dennis Soemers – 2018-08-09T17:16:57.780

1@pasabaporaqui That sounds like Machine Learning to me. Present the program with examples of how I want it to crop, then it should learn how to crop new situations by itself. This is where he says he already has identified the object type and therefore knows the cropping method to use, he'll presumably have different example input+output pairs for different cropping methods. That's just a rough example he has in mind though. He explicitly states that there's the part he's unsure about, so a different solution might be better. – Dennis Soemers – 2018-08-09T17:27:13.483



Depending on kind and amount of data you posess, there are few approaches that you might consider.

  1. Marking target objects on dataset and training CNN that returns coordinates of target object. In this case, remember that it is usually faster when training data ROIs have their coordinates relative to image size.

  2. Use some kind of focus mechanism, like spatial transformer network:

    This kind of network component is able to learn image transformation (including crop) that maximazes target metric for main classifier. This tutorial on pytorch:

    shows some nice visualizations of STN results. Good thing about this kind of network is that, given enough data, it might learn proper transformation from image classification data (photo -> class). One does not need to explicitly mark target objects on image!

  3. Object detection networks, like YOLO, Faster-RCNN. There are many tutorials on that matter, eg:

  4. Saliency extraction. Simple idea is to generate heatmap showing what parts of input image activates classifier the most. I guess you could try calculate bounding box basing on such heatmap. Example research paper:

Points 1 and 2 are probably easies to implement, so I would start with them.


Posted 2018-08-07T23:33:46.087

Reputation: 325

1@pasabaporaqui It might be the case in point 3 and 4, but point 2 directly shows example that even person not knowing much about NN's might understand. Your criticism is counterproductive. – don_pablito – 2018-08-09T11:08:36.687

2Question by OP is pretty clear. The title might be misleading. Still his problem is clearly explained: how to approach cropping image knowing how does it look like. As I understood OP does not have much experience in AI so I provide naive approach together with research subjects that are currently applied to his problem. – don_pablito – 2018-08-09T11:34:16.257

Thanks @paffciu thats correct I have 0 experience in AI professionally or from school and neither does anyone I know so I didn't even know where to start with this problem, this is exactly what I was looking for. – Rider Harrison – 2018-08-12T18:29:00.317


This sounds like you have a supervised learning problem. Microsoft provides a C# library, but it may not be suitable for your problem.

There are many different algorithms you could try, most of which will be within the sub-area of computer vision. Probably some kind of deep neural network is the best bet these days, but the right choice will probably depend on the details of your problem. Goodfellow et al. have a recent book that might be a good resource for deciding what to use.

Maybe someone who works in computer vision can give you a more specific suggestion.

John Doucette

Posted 2018-08-07T23:33:46.087

Reputation: 7 904