Re: Revising a Friendly AI

From: Eliezer S. Yudkowsky (
Date: Sun Dec 10 2000 - 22:03:38 MST

Samantha Atkins wrote:
> We would be in big trouble if we make any mere human the authority over
> relatively unlimited SI capabilities. The process of evolving through
> obeying one (or a group) human will can much too easily go wrong or be
> abused by the humans on purpose or inadvertently. I think a possibly
> better answer is to build as many ethical reasoning factoids,rules and
> philsophy as possible into the AI and then run countless simulated
> decision scenarios. It would only be given general ability to take
> widespread action after satisfactorily passing these scenarios and
> refining its ethical system.

Sure, "wisdom tournaments".

One of the major sources of human wisdom is everything we go through in
learning to build a nice-looking self out of all the emotional dreck and
memetic misinformation we're born with. If an AI is born as a nice
person, will it have all that complexity?

Take it a step farther; give the AI actual ethical misinformation and see
if it can learn to reject it. Though not in real reality, of course.
What you want to do is add a layer on the interpreter, so that it's a
what-if scenario and not an actual AI. AIs, unlike humans, should have
the ability to run what-if scenarios ("What would I say if I believed that
grass was purple?") that are genuinely as good as the real thing. Running
an AI with ethical misinformation is *not* a good thing to do in
bottom-level reality.

That said - run it on unreliable simulated hardware, with random
perturbations to the software, ethical misinformation, factual
misinformation, tempting ends-justify-the-means scenarios, and an instinct
to kill and destroy. If the Friendliness, and the heuristics the AI has
learned from past tournaments, are enough to walk through all that without
making a single nonrecoverable error, then the Friendliness may be strong
enough to face real life.

Of course, in the beginning, most of this is likely to consist of the AI
watching its simulated self crash and burn and end up as a Nazi. But the
layer-that-watches will learn something from it, and eventually the seed
AI will be far more resistant to every kind of foolishness than any human.

On a much more mundane scale, you can use wisdom tournaments to strengthen
ordinary reasoning heuristics. First you solve the problem, then you
resolve the problem with half your brain tied behind your back, then you
resolve the problem after mixing in all kinds of misconceptions. When
you're done, you may be smart enough to find a better problem. That's how
you keep the bootstrap cycle from petering out; first you solve the
problem, then you oversolve it.

-- -- -- -- --
Eliezer S. Yudkowsky
Research Fellow, Singularity Institute for Artificial Intelligence

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:35 MDT