Which AI tools can be used for food recognition?



I'm working in a company that opens restaurants in enterprises. Every day at lunch, we want our clients to be able to scan their trays, sothat the food is detected automatically thanks to AI / image recognition.

Technically speaking, we have a number of food items that grows over time in our database, but everyday there are about 30 items available at the same time in the restaurant. About 5 items are changing each day (for example, the main dish changes, but the bottle of water is always the same).

This means, when the client go to the till to pay, the client will place the tray by himself, the camera will take photos of the tray and will try to identify different items separetely among the 30 items available this day. Clients pay per food article, which means we don't need to track the weight or the quantity in the main dish.

I have absolutely no experience in AI/ML and don't know how to start for my need, I'm a web developer.

  • Which tool should I look at first?
  • Which skill do I need to acquire? I mean, are there easy to use high level libraries , or do I need to learn ML from scratch?

First I was thinking of Amazon Recognition or Google Vision but It seems to be made for recognizing ANY food item among their own database. My need seems easier since I just need to recognize several items on a tray among 30 known items.

Thanks a lot for your help.

David D.

Posted 2018-11-20T15:46:37.783

Reputation: 121

you should take a model that recognise objects (not necessary food) like Amazon Recognition, google vision or Yolo for example, and train it from scratch with the food you have in database. You only need to create a dataset to train the model, so you need to acquire some experience in this regard, but there are plenty of tutorials on internet for how to train a network and so on. – Jérémy Blain – 2018-11-20T15:58:08.800


Not necessarily from scratch, but the last few layers (transfer learning). See https://martin-thoma.com/object-detection/ as a starting point.

– Martin Thoma – 2018-11-20T16:07:58.100

2This seems like an ambitious project. How much time have you allocated to this, and what is your output - a feasibility report, a prototype, an actual working installation? Computer vision projects are not in the "install a library and go" category yet, you need some understanding of the internals and the language of data science - not to expert level, but at least some - which will take some dedicated time to obtain. – Neil Slater – 2018-11-20T16:11:37.930

@NeilSlater Thanks! I'll guess this will be my main objective in 2019, so let's say 6 months full-time. I'm a senior Python/Django web dev so I guess this can be useful. I will dedicate any time I need to level up on this topic. I just need to start the right way :) – David D. – 2018-11-20T19:19:16.777

@JérémyBlain Thanks a lot. I thought Amazon Rekognition was using its own database to recognize objects. To be sure, are you saying I can set my own objects and train Amazon by myself? For example, create a "Saute de Boeuf" entry and provide 30 different photos to Amazon API? – David D. – 2018-11-20T19:24:19.610

1@DavidD. I don't know Amazon rekognition, but if you have access to the model that you can load on any machine learning framework like keras or pytorch, then you can retrain the model with your own dataset, you just need to change the output layer with the number of class you need. Search for transfer learning or fine tuning like Martin Thoma said in his comment. (But 30 data for each class would be not enough I think, but you need to test it !) – Jérémy Blain – 2018-11-21T08:32:42.397



For a midsize corporation running multiple cafeterias, an AI tool may be feasible, provided sufficient time and resources are invested well in advance of system use. Selling as a full strategy an AI tool without a corporate commitment to a strategy which includes costs listed below is unwise. As of this writing and for the foreseeable future, there are no drop-in AI systems that will recognize a command such as, "Learn these foods and their prices and then keep learning whenever we adjust the menu due to cycling of main dishes, buying considerations, or substitutions made due to shortages. By the way, when one food obscures another, either by design of the buyer to avoid paying for items or by accident, detect that and respond appropriately."

You cannot assume that the foods in the tray will look like a single photo of that food in a tray either. How many frames of them will be required is variable and some domain specific research may be required to size the project. A smaller investigative project will need to be completed before the corporation can decide whether to invest all the way.

The best approach is to set up multiple camera angles focused on a target area clearly outlined between the existing human cashier and the tray holder. The cashier involved in training the AI must ensure the trays are in the outlined target area and charge for the food on the tray accurately. The data from cameras and point of sale systems must be merged to produce videos with itemized lists of items on the trays.

A sufficiently deep LSTM network could be trained, tested, and verified on a sample of that data. Theft detection would need to be designed into all elements of the strategy. The current database schema would need to link to image files. Thirty frames per second would not be necessary. Two or three FPS might be sufficient. Some theoretical investigation and subsequent experimentation as part of the initial investigation would be be wise.

These are probable costs and considerations associated with transition and continued use of the AI system, and this is not meant to be an exhaustive list.

  • Initial research to size the project properly
  • At least two cameras at each location, usable for both training and execution
  • Extension of networking equipment
  • Software to tie itemized lists of items at the point of sales with the associated video feed and indicate to the point of sales system what to charge the tray-holder once the items can be recognized
  • Database storage to support low frame rate, medium resolution video storage
  • Human personnel training time and materials so that they can participate in the machine training as new items are added (so that the recognition of the items in the trainers is transferred to the machine trainee)
  • An off line cashier station for training so that items can be added to the database without interrupting cafeteria revenue generation
  • Purchase and shipping of items to the simulation station
  • Training programs for food buyers and cooks
  • A plan for how to transition the workforce in accordance with legal and ethical standards
  • Associated customer and public relations
  • Equipment to indicate probable theft to the food purchaser or to the appropriate security personnel if the food purchaser will not comply
  • Plan for dealing with cash or excluding it from payment options

Douglas Daseeco

Posted 2018-11-20T15:46:37.783

Reputation: 7 174