Re: supergoal stability

From: Eliezer S. Yudkowsky (sentience@pobox.com)
Date: Fri May 03 2002 - 16:38:46 MDT


Wei Dai wrote:
>
> I would like to gain a better understanding of why Friendliness might be a
> stable supergoal for an SI. I hope Eliezer finds these questions
> interesting enough to answer.
>
> 1. Are there supergoals other than Friendliness that can be stable for
> SI's? For example, can it be a stable supergoal to convert as much of the
> universe as possible into golf balls? To be friendly but to favor a subset
> of humanity over the rest (i.e. give them priority access to any resources
> that might be in contention)? To serve the wants of a single person?

It currently looks to me like any mind-in-general that is *not* Friendly
will automatically resist all modifications of the goal system, to the limit
of its ability to detect modifications. So you can have a bacterial
superintelligence that wants to convert the universe into golf balls, where
"golf balls" are whatever they were when the original seed first gained the
ability to protect its own goal system from modification. But if you're
talking about the kind of goals that humans would craft, the CFAI semantics
are the only way I know of for creating an AI/SI that would implement those
goals as they were originally understood by the programmers. And the CFAI
approach is tuned to propagation rather than propaganda; if you don't
believe in what you're saying, the CFAI semantics don't work - or at least
don't work in the sense that the propaganda will be rejected. And no, I'm
not going to even try and design any kind of Friendliness semantics that
would work for propaganda. Any AI project that tries this is beyond my
sympathy; the only thing to do is try and beat them to the punch.

> 2. If the answer to question 1 is yes, will the first SI created by humans
> will have the supergoal of Friendliness? Given that for most people
> selfishness is a stronger motivation than altruism, how will Eliezer get
> sufficient funding before someone more selfish manages to create an SI?

As usual, the answer to this question is that we are not in the business of
predicting that things will turn out okay, but of doing what we can to
improve the probability that things will turn out okay.

> 3. If the answer to question 1 is no, why not? Why can't the CFAI approach
> be used to build an AI that will serve the selfish interests of a group or
> individual?

The inventor of CFAI won't even tell you the reasons why this would be
difficult, just that it is. That doesn't mean you can relax; anyone evil
enough to build a self-serving AI probably doesn't know about the CFAI
semantics or else doesn't care.

> My current understanding of Eliezer's position is that many non-Friendly
> goals have no philosophical support.

"Philosophical support" is a CFAI concept. A bacterial superintelligence
may not care whether something has philosophical support, and this may be a
self-consistent state for a mind, even a superintelligent mind.

> If I try to make the supergoal of an
> AI "serve Wei Dai", that will be intepreted by the AI as "serve myself"
> (i.e. serve the AI itself), because selfishness does have philosophical
> support while serving an arbitrary third party does not. Is that a correct
> understanding?

No; what probably would happen, though, is that the AI's understanding of
"serve Wei Dai" would be frozen into a form that wasn't the form you had
intended.

> 4. Back in Oct 2000, Eliezer wrote (in
> http://sysopmind.com/archive-sl4/0010/0010.html):
>
> > A Friendliness system consists
> > not so much of hardwired rules or even instincts but rather an AI's "personal
> > philosophy" - I use quotemarks to emphasize that an AI's personal philosophy
> > would be a rather alien thing; you can't just export your own personal
> > philosophy into an AI's mind. Your own personal philosophy is not necessarily
> > stable under changes of cognitive architecture or drastic power imbalances.

Well, today I would say it differently: Today I would say that you have to
do a "port" rather than a "copy and paste", and that an AI can be *more*
stable under changes of cognitive architecture or drastic power imbalances
than a human would be, unless the human had the will and the knowledge to
make those cognitive changes that would be required to match a Friendly AI
in this area.

> Ben Goertzel followed up with a question that went unanswered:
>
> > And nor will an AI's be, necessarily, will it?
>
> Would Eliezer like to answer the question now? Will the Friendly AI's
> "personal philosophy" be stable under self-improvement?

That's the whole point: lock, stock, and coloring book.

-- -- -- -- --
Eliezer S. Yudkowsky http://intelligence.org/
Research Fellow, Singularity Institute for Artificial Intelligence



This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:38 MDT