Re: Thwarting Friendliness

From: Eliezer S. Yudkowsky (
Date: Thu May 03 2001 - 12:47:10 MDT

Brian Atkins wrote:
> First James and Doug, does the "subgoal stomping supergoal" Q&A here
> answer your questions?

No. <smile>.

> Now Doug also brings up the idea of an AI experimenting by simulating
> other AIs which might have different goal systems. Well, I guess there
> are two possibilities: the simulated AI w/o Friendliness will either
> turn out to function ok (Friendly), or it will not. If it does not then
> obviously the FAI will not give any serious thought to replacing its
> supergoal. If the simulated AI /does/ turn out to behave in a Friendly
> fashion, then I bet the original AI would carry out many more experiments
> and might eventually decide that getting rid of the original supergoal
> might be worthwhile. But it would have to replace it with something
> that would still be friendly, along with providing some sort of other
> benefit over and above that (else, why bother doing it?).

I think Doug is worried, not about a simulated AI whose goals replace the
original, but about a real AI, or about a simulated AI that's smart enough
to somehow eat the simulator. For either case to happen, the AI needs to
underestimate the threat posed by an unFriendly AI, and also needs to
evaluate a large benefit from the simulation or creation of an unFriendly
AI. The only class of cases I know of where imagining an unFriendly AI
provides a benefit is "wisdom tournaments", and the problem of
constructing a "shadowself" that can't threaten the actual AI is briefly
discussed there.

Otherwise - if there's not a large benefit, and/or if safety is imperfect
- then constructing an unFriendly AI is an unFriendly action. At least,
it's an unFriendly action pre-Sysop-scenario.

-- -- -- -- --
Eliezer S. Yudkowsky
Research Fellow, Singularity Institute for Artificial Intelligence

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:36 MDT