How can an AI train itself if no one is telling it if its answer is correct or wrong?



I am a programmer but not in the field of AI. A question constantly confuses me is that how can an AI be trained if we human beings are not telling it its calculation is correct?

For example, news usually said something like "company A has a large human face database so that it can train its facial recognition program more efficiently". What the piece of news doesn't mention is whether a human engineer needs to tell the AI program each of the program's recognition result is accurate or not.

Are there any engineers who are constantly telling an AI what it produced it correct or wrong? If no, how can an AI determine if the result it produces is correct or wrong?


Posted 2019-11-24T13:38:32.187

Reputation: 371

7You know those "prove you are a human" tests you get on web sites all the time these days? You know how they are often "click on all the pictures that have a stop sign in them"? Guess what: you are the human engineer constantly telling an AI whether it has correctly classified a stop sign. – Eric Lippert – 2019-11-27T03:28:17.027


@EricLippert obligatory xkcd

– Stian Yttervik – 2019-11-27T13:57:12.250


@StianYttervik: Even more obligatory:

– Eric Lippert – 2019-11-27T16:06:48.980



By "company A has a large human face database so that it can train its facial recognition program more efficiently" the article probably means that there is a training dataset $S$ of the form

$$ S = \{ (\mathbf{x}_1, y_1), \dots,(\mathbf{x}_N, y_N) \} $$

where $\mathbf{x}_i$ is an image of the face of the $i$th human and $y_i$ (which is often called a label, class or target) is e.g. the name of the $i$th human. So, the programmer provides a supervisory signal (the label) for the AI to learn. The programmer also specifies the function that determines the error the AI program is making, based on the answer of the AI model and $y_i$.

This way of learning is called supervised learning (SL). However, there are other ways of training an AI. For example, there is unsupervised learning (UL), where the AI needs to find patterns in the data by aggregating objects based on some similarity measure, which is specified by the programmer. There's also reinforcement learning (RL), where the programmer specifies only certain reinforcement signals, that is, the programmer tells the AI which moves or results are "good" and which ones are "bad" to achieve its goal, by giving to the AI, respectively, a positive or negative reward. You can also combine these three approaches and there are other variations.

Are there any engineers who are constantly telling an AI what it produced it correct or wrong?

Yes, in the case of SL. In the case of RL, the programmer also needs to provide the reinforcement signal, but it doesn't need to explicitly tell the AI which action it needs to take. In UL, the programmer needs to specify the way the AI needs to aggregate the objects, so, in this case, the programmer is also involved in the learning process.


Posted 2019-11-24T13:38:32.187

Reputation: 19 783

Comments are not for extended discussion; this conversation has been moved to chat.

– nbro – 2020-03-06T01:09:30.753


Taking your example of the faces data, keep in mind that when the model is run on a new unseen image the model can only return the already seen identity which emerges as the closest match. The result may be incorrect. The chances of mis-identification are much lower as the number of features incorporated increases.

The input of the engineers lies at the level of the training data. Say we have a new photo of an individual that needs to be included in the model. The engineering task is now to morph that image to simulate different environments, angles of view, atmospheric conditions, lighting and so on to provide a large number of data input cases all of which will be "true" since the underlying features are all unchanged since the images are based on the same individual. Then the model is recalculated using the additional data.

Keep in mind too that adding a new set of data to an existing training set has the advantage that the parameters of the model are largely in the right ballpark already, and adding the new faces will make only small changes. Cross validation will show whether the addition has improved or spoiled the model.

Colin Beckingham

Posted 2019-11-24T13:38:32.187

Reputation: 377

2"The chances of mis-identification are much lower as the number of features incorporated increases." This is not necessarily true. – AleksandrH – 2019-11-25T20:32:46.980


how can an AI be trained if we human beings are not telling it its calculation is correct?

What you are looking for is called self-supervised learning. Yann LeCun, one of the originators behind modern neural network systems, has suggested that machines can reason usefully even in the absence of human-provided labels simply by learning auxiliary tasks, the answers for which are already encoded in the data samples. Self-supervision has already been successfully applied to a variety of tasks, showing improvement in multitask performance due to self-supervision. Unsupervised learning would in general be a subset of self-supervision.

Self-supervision can be performed in a variety of ways. One of the most common is to use parts of the data as input and other parts as labels, and using the "input" subset of the data to predict the labels.

Supervised learning looks like this:, human_labels)

The human_labels correspond to entries in various_data, which we expect the model to predict.

Meanwhile, self-supervised learning can look something like this:[:,:500], various_data[:,500:]) 

(Using Python array slice notation, some of the input data are used as training labels.)

For example, a machine could use half of the pixels in an image of a handwritten digit to try to predict the missing pixels. This is a form of self-supervision: Since the machine knows which pixels belong together in the same sample, it can "automatically" produce its own labeled data from the input itself, simply by using some inputs as outputs. However, predicting pixels from other pixels is often not the desired task. So instead, a neural network is often pretrained using self-supervised or unsupervised learning techniques, and then subsequently trained on some amount of human-labeled data as a form of transfer learning.

What the summary of the hypothetical news article promises is that self-supervision made the learning more efficient, not that it outgrew the need for any kind of human intervention. This is exactly what we get from successful self-supervision in pretraining.

In the best possible case, the machine learns to "recognize" each class of digit 0-9 but it still does not know how to ground its own internal labels to the human's labels. Then a human supplying the mapping between the machine's labels and the human-specified IDs would be the only step necessary to upgrade the self-supervised machine to one that is directly useful for digit recognition.

There will always be a need for humans to train a machine via direct supervision in order for the machine to learn the intended task. In order to solve a specific problem, a sufficient degree of supervision is always required, and sufficient labels to reflect the intention must be provided.


Posted 2019-11-24T13:38:32.187

Reputation: 181

1I highly doubt that the article was specifically talking about self-supervised learning, but I think it is useful to mention this approach. – nbro – 2019-11-25T23:57:40.793

@nbro the OP seems to be asking about the principle generally. Many forms of unsupervised learning (autoencoders and their ilk) are included in self-supervised learning, so this is a fairly general response. I am not aware of any form of unsupervised pretraining or auxiliary training that doesn't fit into the paradigm of self-supervision. – pygosceles – 2019-11-26T03:06:10.360


I think you're probably looking at this the wrong way around. A conventional, old-fashioned AI doesn't make a guess, then require confirmation as to whether that guess was right or wrong. Instead, (in the simplest case) it undergoes a one-off computationally intensive "training"/"learning" phase, during which you feed it an enormous number of correct answers (which are labelled as correct) and an even more enormous number of incorrect answers (which are labelled as incorrect). Using whatever learning mechanism it has at its disposal, it then identifies some underlying structure in the "corrects" that doesn't exist in the "incorrects". When, in the future, it encounters something new that seems to also exhibit this structure, then it will classify this as a "correct". It might do rather well, or it might do terribly. Once the one-off training phase is done, it's stuck with whatever capability it has.

Let's say the company you mention is called Facebook and they have a feature that allows you to "tag" your friends in photos. No need to pay engineers to create the largest labelled image database in human history in order to train your AI.


Posted 2019-11-24T13:38:32.187

Reputation: 131

Facebook's face-identification receives constant feedback from users confirming (or denying) the identity of a potential match – Valorum – 2019-11-27T01:12:19.453

@Valorum Correct. – Harry – 2019-11-27T05:48:49.340


What you are missing is what the news story does't mention and gloss over. When a news article says:

company A has a large human face database so that it can train its facial recognition program more efficiently

What it really means is:

company A has a large database of human faces along with additional information such as the identity of the person the face belong to that was created by other humans so that they can use this data set to train its facial recognition program

How training works is basically as follows:

  1. You have a large database of correct (or almost all correct, ideally it should be correct) information that you want to relate one to the other. For example images of faces along with who that face belongs to.

  2. You split this large database into several sets.

  3. You use one set to train the AI.

  4. After looping through the training set you use one or more of the other sets to test the AI and check if the training works.

  5. If you've done this before compare the performance of the current AI to previous AI. Else go to 6.

  6. Tweak some parameters of the AI to try to improve performance.

  7. Go to 2 until you are satisfied with the performance of the AI.

All the steps above are normally automated by scripts. The key here is that the original database has both the question you want to ask the AI (face) and the answer you want the AI to learn (person).

Yes, humans are involved in training the AI but the involvement happens earlier at the database gathering stage.


Posted 2019-11-24T13:38:32.187

Reputation: 111


The trick with unsupervised learning is that the AI doesn't learn that something is a face or not, it just sees unnamed patterns that the researchers need to then name.

Let's say you feed it a dataset with one million pictures in order to train a facial recognition algorithm. After training, the AI will have found a few patterns in the pictures based on the parameters of each picture such as color, lighting, topography, etc. However, without labels (supervised learning) the AI doesn't know what exactly it found, so a researcher then needs to label those patterns. You don't need a label to tell that a picture of a face is mostly different than the picture of a building. You need a label to tell you that one is a "face"and the other is a "building".


Posted 2019-11-24T13:38:32.187

Reputation: 101

But what if the program is designed to verify the identity of people? Let's say the AI program is used to determine "if the newly uploaded images from a bank branch of Alice matches our database record of Alice photos/3D face models". In this case, the program would always produce an answer (be it a percentage point or a boolean value) and if no human is checking its result, I suppose the AI would only reinforce its original pattern it found by "training" itself (no matter the pattern is accurate or not)? – Mamsds – 2019-11-25T17:14:49.227

@Mamsds You would need some kind of supervised label to verify identity. Given a new image, an unsupervised method could return a set of similar faces and a set of dissimilar faces, but you'd need at least some faces that are labeled "Alice" to see if they land in the set of similar faces or not. Without labels, the algorithm could say the new image belongs to Face Cluster 157, for example, but there's no way to link that cluster to Alice's identity unless you tell it. – Nuclear Wang – 2019-11-25T20:44:35.873


I can't remember the researcher's name, but he specializes in psychology in Great Britain and has done a lot of work with machine learning and artificial intelligence.

The project he was working on that I read about earlier this year was one where they tried to deduce how humans learn. They came up with the theory that we learn by making guesses about plausible and possible outcomes and that creates our expectations about reality. When we are wrong, depending on the degree, we are possibly surprised or shocked or not affected at all. They are working on creating AI that does not need human intervention, but to make guesses about outcomes before it performs tasks, and then update those expectations as it experiences more varying outcomes.

Extremely interesting stuff, and definitely closer to how sentient beings gain experience and grow as individuals.


Posted 2019-11-24T13:38:32.187

Reputation: 1