drives ABC > XYZ

From: Michael Vassar (
Date: Tue Aug 30 2005 - 12:54:47 MDT

This concern has been discussed under the topic "subgoal stomp" for a long
time here. To some degree the phenomenon of subgoal stomp arises from the
interaction of shallow analysis on the AI's part, and to some degree from
the lack of clear preference ordering. Humans analyze extremely shallowly,
and have disturbingly unclear preference ordering, so they can imagine
making such transitions. However, even in the case of humans, it is not
clear that they every actually do make such transitions after reaching
adulthood. I suspect that any time in which we come close to such
transitions, our doing so is driven by the influence of a well evolved meme
complex or of another person engineering or preferences as a subgoal for the
satisfaction of their own preferences. None of these circumstances resemble
the likely environment of a transparently and rationally designed AI. Even
in the abstract, such scenarios appear implausible. Drives in a human mind
may be agent-like and "conspire". Criteria in an optimization do not do
this. Three discrete top-level goals might easily interact in such a manner
as to alter or remove one-another, but they should not ever generate novel
top-level goals, only sub-goals. They should only produce sub-goals which
would contribute to supergoal fulfillment, not arbitrary subgoals.

I would not be very concerned by such issues unless I was shown a
non-abstract example. An existance proof that the specific type of
intelligence displayed by humans might undergo such a goal transition would
count. Furthermore, the example should not involve the manipulation of
comparable intelligences, penetration of feeble defenses by meme complexes,
disordered preferences or gross failures of foresight.

Human preferences for fertility correlates are not tightly integrated with
other preferences. Furthermore, such preferences are almost completly
disconnected from the verbal manipulations within which preferences are
formalized. The agent "Phil Goetz" which behaves most like a utility
function is the relatively serial and verbal meme-complex which has
essentially taken over a messy evolved adaptation executor. The verbal
structure of the statement "I was thinking how much easier my life would be"
suggests that this meme complex values "ease", though possibly only as a low
level goal. The meme complex in question contains certain preferences which
refer to the evolved preferences of the adaptation executor, such as a
preference for breast and waist size, but these preferences are relatively
low priority sub-goals. In so far as the meme complex is a semantic web, it
cannot even directly interface with the perceptions which it is referring
to, and in so far as it lacks external reference semantics it doesn't care
about the referrent, only about its input stream.

The unsurprising and correct conclusion to draw from your unrealized
meta-preferences is that humans are not Friendly. Evolution did not create
adaptation executors with pre-adapted mental facilities for strong self
modification during hard take-off scenarios. For this reason, efforts to
construct sentient software modeled on the human mind are extremely unwise,
though not quite as certainly suicidal as some other proposals. This topic
and others are covered rather well in Coding a Friendly AI. There have been
subsequent advances in Friendlyness theory, but this document is the basic
introductory text. It is extremely unlikely that someone who doesn't
understand it will be prepared to contribute insights into AI safety.

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:52 MDT