What approach should I use to detect faces in video game footage?


I have set myself the challenge of detecting the locations of players/bots in videos of a well known first person shooter game (this is for a youtube series I'm planning on doing). I'm not sure which AI approach I should apply to this problem - I'm a complete novice at this!

My first thought was that the face/head seems to have the most detail so I could train a convolution neural network on images of sprite heads and general background - however this seem not to work too well, I've certainly not exhausted different network architectures/typologies but it wasn't learning all that well.

My second approach was to use a HAAR cascade. This seemed to be an obvious choice since it's fast and good at detecting objects (rather than multi-classifying). However my cascade stops after 5 or 6 stages (using OpenCV) as it seems to have reach a great accuracy, but it doesn't detect when I feed it the training images, let alone other images.

I also looked into pedestrian detection and got a stock version of that working. However this seemed to struggle when/if the sprites are crouching or in unusual positions (and it isn't great on standing sprites tbh).

So, is there a branch of machine learning/AI that is more applicable to this problem? If not which should I continue to work on?


Posted 2017-10-30T02:13:23.367

Reputation: 151

To the downvoter, please state how I can improve the question - this is my first post on AI.stackexchange. – FraserOfSmeg – 2017-10-30T10:02:00.593

Your title seems very general but your question is more specific. Maybe you could change it to something like "What approach should I use to detect faces in video game footage?" that would be more relevant? – Markus Tenghamn – 2017-11-01T12:52:03.007

@MarkusTenghamn thanks, that's a good suggestion - done! – FraserOfSmeg – 2017-11-03T00:18:03.127



At first, you can find lots of information as pedestrian detection. As you are trying to localize game characters, the face is not the best option. You need to look for the character in general.

About HAAR Cascades, the algorithm is one of the fastest face localization solutions in the market. The reason is, it applies all the feature classifications layer by layer, by starting from the wider feature. So, if it fails, it does not spend the time to check computationally intensive features. It was good until DNN overcome its success rate. However, it is not the best approach for recognizing game characters/pedestrians.

Also, it seems like you are overfitting the cascade so it stops and does not learn anything valuable. You can search methods about how to reduce overfit problem.

In 2005, a new method has been proposed, HOG (Histogram Of Gradients). You can use this one and classify output features to get what you desire. If you would like to go for deep learning version, I would suggest you to investigate how DNN is working, which kind of input images you need, what are the localization networks (i.e. YOLO, Faster R-CNN), how they are working.

Deniz Beker

Posted 2017-10-30T02:13:23.367

Reputation: 366