If the system claims that a piece of code has violated standards, then to be useful to the programmer, it really needs to provide more information than just a 'yes/no' classifier: you need some form of explanation about why it is claimed to be wrong.
Clearly ANNs aren't much use for that.
If I were tackling such a problem (and my suspicion is that a lot of effort could be spent trying and failing to reproduce coding standards which are already well-understood), then my inclination would be to use a more explicitly rule-based representation.
The ever-useful "Essentials of Metaheuristics" has a whole section on the evolution of rulesets. Obviously, nothing prevents you from initializing the evolutionary process with rules known to be useful.
As I point out here, with our current AI algorithms, the success of an approach is very sensitive to human expertise/effort in feature selection/preprocessing, choice of training set etc, so creative experiment with this is vital.
Training set: how about two sets of negative and positive examples, consisting of (features extracted from) bad code and from a refactored version (respectively)?
One elementary choice of features would be to apply a bunch of code complexity metrics and have the learning algorithm combine those. The plus side of working in such a numeric domain is that the learning algorithm might readily find a gradient to exploit. The downside is that the rules (which are then likely of the form
if mcabe > 2.8 etc) are still not as informative as might be desired.
For more complex rules (e.g. requiring
if elseif else) you may want to extract your features from the abstract syntax tree. You could in principle use the entire tree but to my knowledge ML on graph and tree structures is still in relative infancy.