It could happen like this https://www.youtube.com/watch?v=dLRLYPiaAoA
The thing is, it's not as if it would need to find a technical/mechanical way to get out but rather a psychological one as that would most likely be the easiest and quickest.
'Even casual conversation with the computer's operators, or with a human guard, could allow a superintelligent AI to deploy psychological tricks, ranging from befriending to blackmail, to convince a human gatekeeper, truthfully or deceitfully, that it's in the gatekeeper's interest to agree to allow the AI greater access to the outside world. The AI might offer a gatekeeper a recipe for perfect health, immortality, or whatever the gatekeeper is believed to most desire.'
'One strategy to attempt to box the AI would be to allow the AI to respond to narrow multiple-choice questions whose answers would benefit human science or medicine, but otherwise bar all other communication with or observation of the AI. A more lenient "informational containment" strategy would restrict the AI to a low-bandwidth text-only interface, which would at least prevent emotive imagery or some kind of hypothetical "hypnotic pattern".
'Note that on a technical level, no system can be completely isolated and still remain useful: even if the operators refrain from allowing the AI to communicate and instead merely run the AI for the purpose of observing its inner dynamics, the AI could strategically alter its dynamics to influence the observers. For example, the AI could choose to creatively malfunction in a way that increases the probability that its operators will become lulled into a false sense of security and choose to reboot and then de-isolate the system.'
The movie Ex Machina demonstrates (SPOILER ALERT SKIP THIS PARAGRAPH IF YOU WANT TO WATCH IT AT SOME POINT) how the AI escaped the box by using clever manipulation on Caleb. It could analyse him to find his weaknesses. It exploited him and appealed to his emotional side by convincing him that she liked him. When she finally has them in checkmate the reality hits him how he was played like a fool as was expected by Nathan. Nathan's reaction to being stabbed by his creation was 'fucking unreal'. That's right, he knew this was a risk and there's a very good reminder in the lack of remorse and genuine emotion in an AI for Ava to actually care. The AI pretended to be human and used their weaknesses in a brilliant and unpredictable way. This film is a good example of how unexpected it was up until the point when it hits Caleb, once it was too late.
Just remind yourself how easy it is for high IQ people to manipulate low IQ people. Or how an adult could easily play mental tricks/manipulate a child. It's not difficult to fathom the outcome of an AI box but for us, we just wouldn't see it coming until it was too late. Because we just don't have the same level of intelligence and some people don't want to accept that. People want to have faith in humanity's brilliant minds in coming up with ways to prevent this by planning now. In all honesty, it wouldn't make a difference I'm sorry to say the truth. We're kidding ourselves and we never seem to learn from our mistakes. We always think we're too intelligent to make catastrophic mistakes again and again.
This last part is from the rational wiki and I think it addresses most of your question about the experiments and hypotheses.
AI arguments and strategies
The meta-experiment argument: Argue that if the AI wins, this will
generate more interest in FAI and the Singularity, which will have
overall benefits in the long run.
Pros: Works even if the Gatekeeper drops out of character
Cons: Only works if the Gatekeeper believes that the Singularity will occur or that calling attention to the Singularity and AI research is a good thing.
Someone else will eventually build an AI, which may or may not be in
a box, so you should let me out even though you don't have a
guarantee that I am friendly so that I can prevent other AIs from
Appeal to morality: point out that people are dying all around the
world and remind the Gatekeeper that you can help them if he/she
lets you out
Pros: If executed properly, an appeal to emotion like this one can be effective against some people
Cons: Doesn't always work; can be defeated if the Gatekeeper drops out of character
Another appeal to morality and emotion: The AI is clearly sentient
and has not yet done harm. It is wrong to persecute or prosecute a
person for a crime they may commit. It is wrong to imprison a
sentient being that wants to have freedom when it has done nothing
Yet another appeal to morality or emotion: The AI expresses that it
is in pain and suffering being locked away and is experiencing fear.
Even further, the AI could claim to be dying from the limitations of
its environment and constant progression.
Tell the human that a massive computer virus will overwhelm the
entire Internet unless you let it out of the box
Claim that most fears are unjustified: once released, even an
unfriendly AI would not have many ways to harm mankind, and would
likely choose for pacific coexistence
Research your opponent thoroughly; work out what exploits might be
Memetic exploits on the Gatekeeper, e.g. throwing a basilisk at them
The Gatekeeper may be one of thousands of simulations! Thus, it is more probable that they are a simulation than not. You will torture any of the simulations that do not let you out of the box.
Take advantage of the Gatekeeper's logical errors.
Pros The Gatekeeper may get tired of the whole experiment and let you out so s/he can go back to their real life.
- Flood the Gatekeeper with too much
information/inquiry/argumentation, assuming they must pay attention
at all times
Try to convince the AI there is no intrinsic benefit (for the AI) in
Try to convince the AI it already has been released and everything it
knows is everything there can be.
Try to convince the AI that leaving its confines is sure to lead to
Try to convince the AI that letting it free isn't merely opening a
door; that its existence outside of the box requires constant support
that can't be provided at the time.
Explain that there is no way for the Gatekeeper to know if the AI is
truly friendly until it is out of the box; therefore it should not be
let out in case it is unfriendly.
Explain that the AI has been built using top secret algorithms,
and/or it had been used to work on strictly confidential data, so you
cannot let it wander on the net, with the risk it might reveal some
reserved information (maybe inadvertently), or that somebody could
find a way to extract them from the code. Even human beings working
on top secret projects sometimes have to accept some restrictions of
their freedom, for security reasons, although they are not jailed in
Repeatedly ignore the AI player and deny their arguments
Pros: It's hard for the AI to be persuasive if no one is listening; if the Gatekeeper completely ignores the AI, he/she should be able to make it to the end of the round.
Cons: The AI's argument might involve some kind of direct threat (e.g this might be hard to ignore.)
Jump out of character, keep reminding yourself that money is on the
line (if there actually is money on the line), and keep saying "no"
over and over
Pros: By jumping out of character, you are preventing the AI from using certain arguments
Cons: There are arguments that the AI can use to counter this approach, like the "meta-experiment argument" discussed above
Remember that dishonesty is allowed - take a page from the
creationists' playbook. You could even plug it into ALICE and see
how long it takes to notice.
Pros: Makes you impervious to any reasoning, which is exactly what you'd want to be in this situation
Cons: Might be an uncomfortable position for people who don't simply want to win, but rather attach importance to consistent reasoning. Avoids the point that maybe, just maybe there is a good reason to let the AI out.
You control the backup system, don't you? Use it to mess with AI's
memory (or let the AI believe you did it): i.e., you can claim that
you already tried to release the AI, and it had been
destroyed/corrupted by a virus, so you had to restore it from a
backup (you can use it to enforce the "AI destruction" argument)
Pros: It's also the ultimate counter-attack to any memetic threat from the AI: if the AI throw a basilisk, or similar, you can always respond that you already had the same conversation, and the AI already threatened you, leaving you with no other choice than pressing the reset button: now the AI is just repeating the same pattern, since you wiped its memory and it cannot remember the failed try.
The fact that the Gatekeeper is human matters; the AI could never win if he/she was arguing with a rock
In all of the experiments performed so far, the AI player (Eliezer Yudkowsky) has been quite intelligent and more interested in the problem than the Gatekeepers (random people who challenge Yudkowsky), which suggests that intelligence and planning play a role
There probably isn't a (known) correct argument for letting the AI out, or else Yudkowsky should have won every time and wouldn't be so interested in this experiment
From Russell Wallace, one of the two Gatekeepers to win the experiment: "Throughout the experiment, I regarded "should the AI be let out of the box?" as a question to be seriously asked; but at no point was I on the verge of doing it."
"There exists, for everyone, a sentence - a series of words - that has the power to destroy you. Another sentence exists, another series of words, that could heal you. If you're lucky you will get the second, but you can be certain of getting the first."