RE: Self-modifying FAI (was: How hard a Singularity?)

From: Ben Goertzel (
Date: Wed Jun 26 2002 - 10:49:28 MDT

mr. yudkowsky wrote [regarding a message by stephen reed]:
> >I agree with your response to Ben. We don't expect an AI's belief that
> >the sky is blue to drift over successive rounds of
> >self-modification. Beliefs with an external referent should not "drift"
> >under self-modification except insofar as they "drift" into
> correspondence
> >with reality. Write a definition of Friendliness made up of
> references to
> >things which exist outside the AI, and the content has no reason to
> >"drift". If content drifts it will begin making incorrect
> predictions and
> >will be corrected by further learning.

The problem is that "friendly" is a vastly more ambiguous thing than "blue."

The term "friendly", as used in this context, wraps up all the long-standing
unsolved issues of moral philosophy...

james higgins wrote:
> Unfortunately, can we construct a definition of friendliness
> using external
> reference points which truly equals what we really want? Given much
> greater knowledge and intelligence what we attribute to friendly behavior
> may end up looking quite different.

Yeah, exactly. This possibility is a consequence of the subtlety and
ambiguity of the concept of "friendly" or "good" behavior...

To keep its evolving notion of Friendliness on track, is the AGI supposed to
compare its notion of Friendliness to its model of Eliezer's notion of
Friendliness, as a referent? To Ben's notion of Friendliness? To the
Pope's? To George W. Bush's? Al Gore's? Osama bin Laden's?

What if it finds different peoples' notions of Friendliness are
inconsistent? Does it take an average? What kind of average?

You may say: there will be commonalities among different peoples' subjective
views of Friendliness, for instance, everyone agrees that killing people is

OK, then why do we have the death penalty in the US? Because a lot of folks
think that what's unfriendly to a murderer, is most friendly to the
population as a whole?

Sure, everyone may agree that AGI's should not be involved with executions,
even if they think humans should be... the death penalty is an extreme
example. But it's similar in form to other less extreme examples.

For a slightly less extreme example, how about manipulating the global
financial markets? I knew a derivatives trader who killed thousands of
people in Malaysia, inadvertently, by shorting their currency and causing a
financial calamity in their country, resulting in starvation in remote
regions of the country. Was this Friendly? He thought so, because he
reckoned that by increasing the efficiency of the global financial markets,
he was increasing the wealth of the world overall, even though it was bad
for some people temporarily...

I'm not saying it's *impossible* for "comparison with human ideas on
Friendliness" to be used to constrain an AGI's evolving notion of
Friendliness. But it's certainly not much like using "comparison with human
ideas on what a pig is" to constrain an AGI's evolving notion of "pig."

> A human can understand the words "The intention that was in the
> mind of the
> programmer when writing this line of code", but they could never fully
> UNDERSTAND it. This is why I think you need to have more real life
> experience, Eliezer. Those of us that are married can easily understand
> why the above is not possible. You can never FULLY understand
> what someone
> else intends by something.

Huh? What are you talking about? My wife and I always understand each
others' intentions *perfectly* ;-D

--ben g

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:39 MDT