Re: supergoal stability

From: Eliezer S. Yudkowsky (
Date: Sat May 04 2002 - 02:28:08 MDT

Wei Dai wrote:
> On Fri, May 03, 2002 at 06:38:46PM -0400, Eliezer S. Yudkowsky wrote:
> > It currently looks to me like any mind-in-general that is *not* Friendly
> > will automatically resist all modifications of the goal system, to the limit
> > of its ability to detect modifications.
> Thanks, that does make a great deal of sense. I thought that the
> difficulty with creating an AI/SI that would implement goals as originally
> understood by the programmers was that once the AI became sufficiently
> intelligent, it would somehow decide that the goals are too trivial and
> not worthy of its attention. But I guess there is really no reason for
> that to happen, and the danger is actually in the earlier less intelligent
> stages, where it may make mistakes in deciding whether a candidate
> self-modification is overall a positive or negative contribution to its
> supergoal.

Different minds, different dynamics. A Friendly AI and a nonFriendly AI
think about goals very differently. A human has millions of years of
evolution, and an FAI has the human programmers; both are sculpted into
moral systems of much greater complexity than the simplest moral systems
that will support general intelligence.

> > The inventor of CFAI won't even tell you the reasons why this would be
> > difficult, just that it is.
> Why not? If someone was to naively try to use the CFAI approach to create
> an AI that serves some goal other than Friendliness, what is the likely
> outcome? Would it be catastrophic or just fruitless?

Possible outcomes range from the programmers messing up the CFAI
architecture (most likely), warped goals, bacterial goals, or convergence to

> > Well, today I would say it differently: Today I would say that you have to
> > do a "port" rather than a "copy and paste", and that an AI can be *more*
> > stable under changes of cognitive architecture or drastic power imbalances
> > than a human would be, unless the human had the will and the knowledge to
> > make those cognitive changes that would be required to match a Friendly AI
> > in this area.
> Since you're planning to port your own personal philosophy to the AI, do
> you have a document that explains in detail what your personal philosophy
> is?

"Personal philosophy" is here not used in the sense of "My own personal
philosophy, which is just mine and nobody else's" but rather in the sense of
describing "that portion of philosophy which you, personally, have managed
to acquire." C. S. Lewis would say that a personal philosophy is that
portion of the Tao which you have managed to acquire for yourself. I have
never tried to construct a personal philosophy in the former sense. I have
tried to make my philosophy that which I believe to be right and to leave
out everything that is "just one person". I don't want the Friendly AI to
have anything which is derived specially from Eliezer Yudkowsky and not from
humanity in general. It was a large enough concession, from my perspective,
to admit that an AI might perhaps need something which is derived specially
from humanity and is not a property of all possible minds-in-general of
sufficient intelligence.

And hence, "Creating Friendly AI" says most of what needs to be said.

> I'm particularly interested in the following question. If two groups
> of people want access to the same resource for incompatible purposes, and
> no alternatives are available, how would you decide which group to grant
> the resource to? In other words, what philosophical principles will guide
> the Sysop in designing its equivalent of the CPU scheduling algorithm?

That's an interesting question. I would expect/hope resource conflicts
along these lines to be rare. One take is that after the Singularity all
sentient beings would get a quantity of "mana" and that mana could be used
to bid on whichever universal resources are conserved, after which all
conserved resources would be private property. But that's just a guess.

-- -- -- -- --
Eliezer S. Yudkowsky
Research Fellow, Singularity Institute for Artificial Intelligence

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:38 MDT