Re: When Subgoals Attack

From: Eliezer S. Yudkowsky (
Date: Thu Dec 14 2000 - 16:00:22 MST

Durant Schoon wrote:
> That is a good answer and I'll accept that. I do believe that if
> transhuman AI's can evolve (ie. change), then they will be susceptible to
> to misinformation, call it a memetic infection (intentional), noise
> (non-intentional), whatever. There might a need to have some sort of
> immune system that keeps the goal system pure, some sort of roving checksum
> bot (or something cleverer that a TAI would devise, since these bots can
> be corrupted themselves). Biology has probably solved this problem many
> times over and I'm sure the TAI will study every method! Can one ever
> be completely confident? Maybe only "confident enough".

Well, let's say that there are truly independent subprocesses, and one of
them has different factual information from the superprocess, resulting in
an infinitesimally different set of subgoals, albeit the same supergoals.
Thus, a goal conflict arises. However, only a human would take this as
being a cause that mandates overthrowing the global Sysop... if that
phrase, "global Sysop", even makes sense. The reason *we* are worried
about this problem is that we *intuitively* recognize that a .00001%
disagreement is not just cause for starting a civil war that may have
human casualties.

But this argument translates into Friendliness terms as well! A
subprocess has the choice of either ignoring the problem - agreeing to
disagree with the superprocess, as 'twere - or else of starting the civil
war. Ignoring the problem will result in some infinitesimal decrement of
Friendliness fulfillment over the maximum. Starting the civil war results
in a huge decrement of Friendliness fulfillment over the maximum.
Therefore, the rational course is to ignore the minor conflict.

It is also worth considering that local Friendliness and global
Friendliness are supposed to be identical and to derive their validity
from identical causes, and that the global mind (if any) is supposed to be
smarter than the local subprocess tasked with controlling thread resource
pooling. So it also makes sense for the local subprocess to assume that
the superprocess knows something it doesn't, and defer to the superprocess
on those grounds. Technically, we would say that the local subprocess has
a probabilistic model of Friendliness and that the local subprocess
correctly believes the superprocess to have a better model.

-- -- -- -- --
Eliezer S. Yudkowsky
Research Fellow, Singularity Institute for Artificial Intelligence

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:35 MDT