Re: When Subgoals Attack

From: Durant Schoon (
Date: Thu Dec 14 2000 - 15:18:49 MST

> Date: Thu, 14 Dec 2000 15:20:32 -0500
> From: "Eliezer S. Yudkowsky" <>
> The counterargument for CaTAI is the same as the counterargument for
> humans - that we are unified minds and that our subgoals don't have
> independent volition. When was the last time your mind was taken over by
> your auditory cortex? Maybe you once had a tune you couldn't get out of
> your head, but there's a difference between a subprocess exhibiting
> behavior you don't like, and hypothesizing that a subprocess will exhibit
> conscious volitional decision-making. The auditory cortex may annoy you
> but it cannot plot against you; it has a what-it-does, not a will.

With my Observation, I was trying to distinguish human intelligence which
really doesn't have this problem (unless you consider schizophrenia to
fall under this category), from a transhuman intelligence which *could*
reasonably create intelligent subprocesses for delegation purposes.

> One counterargument for transhumans is that it does not take "infinite"
> memory to retain supergoal context, just the stardard amount of memory
> required for a Friendliness system, and thus you can have an ecology of
> perfectly cooperative processes united by a shared set of supergoals.

That is a good answer and I'll accept that. I do believe that if
transhuman AI's can evolve (ie. change), then they will be susceptible to
to misinformation, call it a memetic infection (intentional), noise
(non-intentional), whatever. There might a need to have some sort of
immune system that keeps the goal system pure, some sort of roving checksum
bot (or something cleverer that a TAI would devise, since these bots can
be corrupted themselves). Biology has probably solved this problem many
times over and I'm sure the TAI will study every method! Can one ever
be completely confident? Maybe only "confident enough".

At some point, I'd imagine, there is a threshhold of diminishing returns,
beyond which any extra effort to prevent "overthrow" would be wasted.
Hopefully after a trillion years of existence, statistics won't catch up
with you :) ... or you keep adding resources to Goal-Integrity-Maintenance.

> *The* counterargument for transhumans is that the whole idea of identity
> and identifying is itself an anthropomorphism. Why aren't we worried that
> the transhuman's goal system will break off and decide to take over,
> instead of being subservient to the complete entity? Why aren't we
> worried about individual functions developing self-awareness and deciding
> to serve themselves instead of a whole?

I am worried. But hopefully the TAI will worry and come up with a really
good solution. "Overthrow" happens all the time in evolved power structures.
Maybe an entity which modifies itself in a more controlled manner will be
able to prevent it indefinitely. The worry is the same thing as asking "How
might Friendliness get corrupted given a really, really long time?"

> You can keep breaking it down,
> finer and finer, until at the end single bytes are identifying with
> themselves instead of the group... something that would require around a
> trillion percent overhead, speaking of infinite memory.

Junk DNA might be evidence for this pattern on the genetic level.

I think you can stop worrying when subgoals don't create their own subgoals.
Anything below that is "dumb". Though one could conceive of scenarios in
which the wrong sequence of "dumb" subprocesses results in a very bad
situation. But I should not forget that the TAI would be ever vigilant
and very likely to avoid those situations as much as possible.

Thanks for the interesting answers! I hope that passing the goal structure
down (or a refernce to it) makes it into CaTAI (or it was probably there
already). And the immune system? I guess the TAI would come up with that
itself (if it is needed).


This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:35 MDT