Is the Turing Test, or any of its variants, a reliable test of artificial intelligence?



The Turing Test was the first test of artificial intelligence and is now a bit outdated. The Total Turing Test aims to be a more modern test which requires a much more sophisticated system. What techniques can we use to identify an artificial intelligence (weak AI) and an artificial general intelligence (strong AI)?

6Definitely requires a statistical approach with a number of participants. I've meet some humans who would not pass the Turing Test. – SF. – 2016-08-02T16:01:10.020

1It depends on what you define intelligence as. – baranskistad – 2016-09-08T02:26:30.840



The rhetorical point of the Turing Test is that it places the 'test' for 'humanity' in observable outcomes, instead of in internal components. If you would behave the same in interacting with an AI as you would with a person, how could you know the difference between them?

But that doesn't mean it's reliable, because intelligence has many different components and there are many sorts of intellectual tasks. The Turing Test, in some respects, is about the reaction of people to behavior, which is not at all reliable--remember that many people thought ELIZA, a very simple chatbot, was an excellent listener and got deeply emotionally involved very quickly. It calls to mind the Ikea commercial about throwing out a lamp, where the emotional attachment comes from the human viewer (and the music), rather than from the lamp.

Turing tests for specific economic activities are much more practically interesting--if one can write an AI that replaces an Uber driver, for example, what that will imply is much clearer than if someone can create a conversational chatbot.

The problem of the Turing Test is that it tests the machines ability to resemble humans. Not necessarily every form of AI has to resemble humans. This makes the Turing Test less reliable. However, it is still useful since it is an actual test. It is also noteworthy that there is a prize for passing or coming closest to passing the Turing Test, the Loebner Prize.

The intelligent agent definition of intelligence states that an agent is intelligent if it acts so to maximize the expected value of a performance measure based on past experience and knowledge. (paraphrased from Wikipedia). This definition is used more often and does not depend on the ability to resemble humans. However, it is harder to test this.


The classical Turing Test certainly does have limitations. Because I don't see it mentioned here yet, I'll suggest you read about The Chinese Room, which is one of the most commonly cited reasons why the Turing Test indeed falls short of ascertaining true 'consciousness'. However, I'd also note that Turing himself, in the original paper that proposed the Turing Test, explicitly acknowledged himself that the test was not a test to detect consciousness:

I propose to consider the question, "Can machines think?" This should begin with definitions of the meaning of the terms "machine" and "think." The definitions might be framed so as to reflect so far as possible the normal use of the words, but this attitude is dangerous, If the meaning of the words "machine" and "think" are to be found by examining how they are commonly used it is difficult to escape the conclusion that the meaning and the answer to the question, "Can machines think?" is to be sought in a statistical survey such as a Gallup poll. But this is absurd. Instead of attempting such a definition I shall replace the question by another, which is closely related to it and is expressed in relatively unambiguous words.

The new form of the problem can be described in terms of a game which we call the 'imitation game."

This imitation game is the test that we now know today (and also the inspiration for the name of a recent feature film starring Benedict Cumberbatch and Keira Knightley).


1Great answer. I also share the opinion the shifting from "Imitation Game" to "Turing Test" has led to some deep misconceptions about the ramifications. (Pornbots pass the Turing Test all the time;) – DukeZhou – 2017-07-11T05:03:20.363


There are many definitions of Artificial Intelligence out in the wild. All these definitions are part of one (or more) of the areas. There are four main domains, and the picture below will shed some light over this.

enter image description here

Turing Test revolves around the left side of the cardinality, which is mostly concerned with how humans think or act. But, we know that this is just not all. Turing Test has not much to offer when it comes to what AI is in a general sense.
Turing Test, as the Wikipedia states, was created to test machines exhibiting behaviour equivalent or indistinguishable from that of a human. Artificial Intelligence is much more than what humans can do or how they act. There are many human acts that are considered unintelligent and sometimes inhuman too.
Chinese Room Argument focuses on something very important when it comes to "Consciousness v/s Simulation of Consciousness". John Searle argued there that it is possible for a machine (or human) to follow a huge number of predefined rules (algorithm), in order to complete the task, without thinking or possessing the mind. Weak AIs are good at simulating the ability to understand but, don't really understand what they are doing. They don't exhibit "Self-Awareness" and don't form representation about themselves. "I want that v/s I know I want that" are two different things.

As Theory of Mind states that a good AI should not just form representation about the world it is working on, but also about other agents and entities in the world. This two concepts of self-awareness and theory of mind draw a thin line between weak and strong AI.

When it comes to the Turing Test, it fails on many grounds and so does the Total Turing Test, which adds another layer to the test. Most of the researchers believe that Turing Test is just a distraction from the main goal, something that hinders them from fruitful work. Consider this, suppose you ask a difficult arithmetic problem in order to distinguish between human and machine. If the machine wants to pretend it is human then it will lie. This is not what we want. Going for the Turing Test sets the upper bound to the AI that can be created. Also making AI act and behave like humans is not a very good idea. Humans are not very good at making right decisions all the time. This is the reasons why we read about wars in our history books. Decisions which we make are often biased, have selfish origins, etc. We don't want an AI to come with all those things.

I don't think there is one test to test an AI. This is because AI has many definitions, many types. Whether an AI is weak or strong can be tagged while looking for answers to questions like, "I want that v/s I know I want that", "Who am I and what exactly I am doing (from machine's perspective)", plus some other questions I mentioned above.


It depends on how the test is given. For example, when people claimed that a machine had successfully passed the Turing Test a few years ago, the criteria was pretty weak. It only had to fool 30% of the people for 5 minutes. That's not much of a test. To put this in perspective you probably wouldn't detect schizophrenia, autism, learning disabilities, or dementia with this criteria.

In spite of the hype, the current AI's can be detected 100% of the time using fairly simple questions.


Good point. "Given a sufficient number of questions..." (Of course, on the Voight-Kampff test seems quite effective with a limited number of questions;)

– DukeZhou – 2018-01-17T00:37:57.463


Is the Turing Test, or any of its variants, a reliable test of artificial intelligence?


Yes, if one defines the term Artificial Intelligence in terms of Alan Turing's Imitation Game or one of its variants. The approach may be, at the same time, both valid and very limited as a definition of intelligence as people interpreted the word before AI emerged.

Proven Intelligence

Consequently, there are a large number of alternative approaches to measuring intelligence, artificial or otherwise.

  • Becoming a chess grand master
  • Authoring a winning chess program
  • Receiving a highly selective international award
  • Creating a strategy that wins a war or a peace
  • Overcoming the thousands of rounds of elimination in business or politics to become President
  • Authoring brilliant articles, papers, screenplays, lectures, speeches, books, or poems that generate significant human paradigm shifts
  • Showing genius level results in a Mensa test
  • Becoming one of the most wealthy people in the world

Normal Measurement of Normal Intelligence

But these are measurements of exceptional intelligence of some kind, mostly because the leaders in these areas have reliably applied intelligence over multiple domains in such ways that led to remarkable success through multiple real life scenarios. The reliability is an attribute of the person possessing the intelligence, not the test of intelligence itself.

These are more mundane, yet perhaps more valid and reliable, measures of intelligence.

  • Raising healthy and loving children as verifiable through the careful interviewing of friends and associates of the members of the family
  • Repeated and successful remedy of many conditions of varying types that were once identified as broken in some tangible and measurable way and found to be measurably corrected as a result of the application of intelligence comprehension, analysis, and remedial action
  • Conversational intelligence as measurable through the participants in conversation attributing their own success to the ideas and examples set by the conversationalist

What Are the Truly Desired End Goals?

Perhaps the primary characteristic of the Turing Test is that it is artificial. If artificial intelligence is what we want from AI software, then that is what we will receive. However, it is likely we want something either considerably more or considerably less.

We want more in that it would be nice of some computers could be our friend, our mentor, and an unpaid employee with exceptional abilities leading to our personal success in terms of income, influence, popularity, or legacy.

We want less in that we want some computers to do domain specific tasks and remain as fully subservient tools, perhaps with some personality and warmth, like a ship or some other complex device we give human names, yet without the unpredictability of the far reaching capabilities of human intelligence.

