Re: Paperclip monster, demise of.

From: Michael Wilson (
Date: Thu Aug 18 2005 - 00:09:21 MDT

Richard Loosemore wrote:
> Your comment above, about my not understanding "how reasoning about
> goals works when the goals are all open to reflective examination and
> modification" and the other comments about "goal systems" that appear in
> your reply, all come from a very particular, narrow conception of how an
> AI should be structured.

Correct. They come from a examination of how a normative AI would be
structed; an AI that contains no unnecessary opacity or counterproductive
internal inconsistencies, and that reasons using probabilistic logic or
appropriate approximations thereof.

> How can I characterize it? It is a symbolic-AI, goal-hierarchical,
> planning-system approach to AI that is a direct descendant of good
> old Winograd et al.

Symbolic, no. The fatal flaws of 'Good Old Fashioned AI' are often
mentioned by the SIAI (from 'Creating a Transhuman AI' onwards) and
indeed most of the serious AGI researchers on this list. There is some
disagreement as to what they are and how to correct them, but I don't
think anyone is advocating SOAR and its bretheren as a model for an AGI.

Goal-hierarchical, yes. Goal-hierachical systems behave in relatively
predictable ways, including having relatively predictable forms of goal
system stability. They are also efficient to evaluate and effective as
reality-optimisers. The term 'causally clean' denotes a superset of
AGIs that includes 'goal-hierarchical' in the classic sense of a tree
of goals and implied subgoals. Systems that are not causally clean are
extremely difficult to predict once capable of self-modification,
subject to counterproductive goal inteference and 'subgoal stomps', and
as such unsuitable as a basis for Friendly seed AI. It is arguable that
non-'goal-hierarchical' systems will inevitably self-modify into such
systems; humans are more goal-hierarchical than animals and you yourself
pointed out that you'd prefer to self-modify to have a more consistent
goal system. However this is an unproven hypothesis at this time.

'Planning-system', not particularly. Plans are clearly highly useful,
but they are not always required nor are they the be-all and end-all
of inference. Again I don't know of any AGI researchers here who are
working on anything like a classic 'planning system'.

> Just for the record, I know perfectly well what kind of goal system you
> are referring to. (I have written such systems, and taught postgrads
> how to write them).

You've written systems that choose actions based on expected utility,
as evaluated by a utility function fed by probabilistic inference?

> But I have also just spent 6000 words trying to communicate to
> various posters that the world of AI research has moved on a little
> since the early 1980s, and that there are now some very much more
> subtle kinds of motivational systems out there.

Most of which are pointless inefficiency and obfuscation, or simply
don't work at all.

> I guess I made the mistake, from the outset, of assuming that the
> level of sophistication here was not just deep, but also broad.

On the contrary, there are plenty of people with deep knowledge of
the various approaches to AI that are or have been popular in
academia. I personally am a tireless advocate of studying the
mistakes of the past in order to learn from them and avoid being so
foolish in the future. Pointless obfuscation of the goal system is
a classic one.

> Do you know about the difference between (1) quasi-deterministic
> symbol-system that keeps stacks of goals, (2) a Complex assemblage of
> neurons, (3) Complex systems with partly symbolic, partly neuron-like
> properties?

I can't speak for anyone else, but I've critiqued all of these at
length (and the folly of opaque, emergence-based approaches to AI in
general). Symbolic systems may be dysfunctional, but they at least
tend to have a clear idea of how their functional components are
supposed to contribute to intelligence. This property tends to
dissapear the futher one goes down the connectionist spectrum. The
limited successes of connectionism can largely be attributed to the
fact that having no idea of how intelligence works at a global
level is often better than having an actively wrong idea of how
intelligence works.

> Do you understand the distinction between a set of goals and a set
> of motivations

Only if you're talking about humans. 'Motivation' is not a well-defined
term in AI (hell, 'goal' doesn't have a rigorous consensus definition,
but it's a lot better than 'motivation'). By comparison 'expected
utility' is such a rigorous term.

> About the way that motivational systems can be the result of
> interacting, tangled mechanisms that allow the entire system to be
> sensitive to small influences, rendering it virtually non-deterministic?

Of course they /can/ be; it's obvious that humans use a motivational
system of this type. You were arguing that general intelligence /must/
be like this, which is simply incorrect ('complete nonsense', to borrow
your phrase). The argument that I am making is that such a nondeterministic
system will (almost) inevitably fall into a stable, deterministic attractor
after some time spent traversing the space of possible goal systems via
>> Whether a system will actually 'think about' any given subjunctive goal
>> system depends on whether its existing goal system makes it desireable
>> to do so.
> In general, an AI could use a goal system and motivation system that
> caused it to shoot off and consider all kinds of goals,

Goal systems 'cause' an AI to do desireable things, pretty much by
definition. If it is considering something, either the act of consideration
is desireable or the action was generated by some internal process that is
not strongly linked to the goal system, but was set in motion because doing
so was considered a desireable action as a whole (a common term for such
processes is 'subdeliberative'). The goal or action does not have to be
'desireable' to be eligable for consideration, only the act of considering
it. But it must be 'desireable' to be eligable for implementation. If you
have any causation going on in your AGI that isn't the result of the goal
system, then you have build an unstable, unreliable and inefficient system
for no good reason.

> The AI says "I could insert inside myself ANY motivation module in the
> universe, today. Think I'll toss a coin [tosses coin]: Now I am
> passionately devoted to collecting crimson brocade furniture, and my
> goal for today is to get down to that wonderful antique store over in
> Chelsea."

Where did the desire to toss a coin and create a random goal come from?
Alternately, /why/ did the AI create a random goal? What causal process
selected and initiated this action? If you have an answer to that, please
explain why you would possibly want to build an AI that used such a

> Guess what, it *really* wants to do this, and it genuinely adopted the
> antique-hunting goal, but are you going to say that its previous goal
> system made it desirable to do so?

You could have simply hardcoded a random goal generator into your AGI
independently from the structure that you have labeled the 'goal system'.
I would say that the so-called 'goal system' is now a goal system in
name only, as the effective goal system of the AI (the root sources of
cognitive causation that we would specify if we were modelling your AI
as a generally intelligent agent) now include the non-explicit 'insert
random goals' goal.

> That this new motivation/goal was actually determined by the previous
> goal, which was something like pure curiosity? The hell it did!

/You/ are specifying the AI, /you/ are specifying what determines what
in the design. Since in this universe events have causes, something
caused the goal to be created. If you have implemented a 'curiosity'
goal that tries to infer what a version of the AI with new goal X
would do by self-modification, then so be it. If you have implemented
a bit of code that randomises the goal system without itself being
explicitly represented as a goal (going against your own 'AGI
understands its own entire codebase' statement earlier), then you
have created an implicit goal rather than an explicit one and gained
nothing but a new source of potential confusion.

>> A goal system is simply a compact function defining a preference
>> order over actions, or universe states, or something else that can
>> be used to rank actions.
> A goal system *is* a compact function defining a preference order over
> actions/universe states? Who says so?

/All/ decision functions generate a sequence of prefered actions. I
challenge you to name any decision function (i.e. any AI) that doesn't.
It is not quite true to say that all decision functions generate a
preference order over all possible actions, because forcing a
single-selection decision function to produce a ranked sequence by
progressive elimination of options may not produce a consistent sequence
due to preference intransitivity (still, you can average). It is true
that all rational systems will produce a single ranking sequence given
indefinite computing power.

Decision functions are a well-defined concept. 'Goal systems' aren't,
so I'll need to be more specific. The 'goal-hierarchical' systems you
speak of usually have an overall decision function that consists of
a variable 'goal system', which a smaller fixed decision function
combines with the fruits of inference to select a preffered action.
For expected utility, the goal system is the utility function and the
fixed decision function is the expected utility equation instantiated
to evaluate a particular class of entities (classically universe
states or substates, although there are other possibilities). It is
perfectly possible to implement your decision function in a
distributed fashion, mash together goals and goal evaluators, even
mash together desirability and certainty (humans unfortunately suffer
from this; AI designers trying to perpetuate the mistake is inexcuable).
But you are simply obscuring your intent and complicating analysis and
prediction. Ultimately we can always analyse the causal relations
within your AI, isolate (and abstract if necessary) the set of root
selection mechanisms and hence infer the decision function. Whether
there will be a clean separation of fixed and dynamic components, and
indeed whether the whole thing will be tractable for humans, depends
on how mangled your architecture is. For Friendly AI design, the
researcher must endeavour to keep things as straightforward as possible
for there to be any hope of predictability at all. Fortunately on
analysis there don't seem to be any good reasons for such obfuscation.

> Many people would say it is not: this is just a particular way of
> construing a goal system. It is possible to construct (Complex, as
> in Complex System) goal systems that work quite well, but which
> implicitly define a nondeterministic function over the space of
> possible actions, where that function is deeply nonlinear, non-analytic
> and probably noncomputable.

Non-linear, certainly; we don't call thermostats AIs. Nondeterministic,
only if you incorporate a source of quantum noise into the system and
make its actions dependent on that noise. There may or may not be good
reasons for doing this; generally I subscribe to the philosophy that
injecting randomness into an AI is a last resort indicative of the
designer's failure to think of anything better, but there are specific
scenarious where it would be desireable to be nondeterministic. That
said, AFAIK no current AGI projects make use of a truely nondetermistic
RNG. Noncomputable, certainly not, since the AI is running on a computer!
You could argue that human cognition is noncomputable, but that's a
seperate argument.

> And, yes, it may be unstable and it may spend the entire lifetime
> of the universe heading towards a nice stable attractor *and never
> get there*....

Now this is an interested question, how fast arbitrary (or randomly
selected) unstable AI goal systems will stabilise. Right now, the
rigorous theory doesn't exist to answer this question for anything but
a few trivial cases. My personal guess, based on my own research, is
'pretty fast', but I don't make any claim of reliability about that
guess. If you can give a formal description of such a system, please
do so, otherwise your statement is similarily pure speculation.

> What? Did nobody here ever read Hofstadter? I'd bet good money that
> every one of you did,

You win. For one thing, Eliezer was recommending GEB as the one book
everyone must read for years.

> so what is so difficult about remembering his discussion of tangled
> hierarchies,

They seemed like a good (nigh revolutionary) idea at the time. They
continued to seem like a good idea for inference for some time after
it became apparent that having a distributed representation of the
goal system was foolish for predictability, transparency (to the AI
and the programmers), internal stability and efficiency reasons.
Finally it became apparent that getting active symbol networks to
actually do anything relied on a loose set of weak inductive
mechanisms that sounded broadly reasonable to the implementers.
But 'tangled heirarchy' representational structure does not have
to imply a similarly tangled causal structure, and indeed once the
latter is removed the former becomes a lot more useful.

> and about how global system behavior can be determined by
> self-referential or recursive feedback loops within the system, in
> such a way that ascribing global behavior to particular local
> causes is nonsense?

This is not news. Human goals aren't dependent on specific neurons,
goals and decision functions can have distributed representations.
The fact that human goals can still be concisely described underscores
the point that distributed representations are an implementation
strategy forced by sucky human hardware, not a cognitive necessity
nor indeed a good idea. Feel free to list perceived real-world
benefits of having distributed goal represnetations (or indeed,
distributed decision functions with no clear goal/evaluator
separation, e.g. 'motivational mechanisms' that influence behavoir
in a brain-global way), and I will tell you how those benefits are
achievable without the attendant drawbacks by competent design of
compact, explicit goals and decision functions.

> Why, in all of this discussion, are so many people implying that
> all goal systems must be one-level, quasi- deterministic
> mechanisms, with no feedback loops and no nonlinearity?

They aren't. Self-modification is a feedback loop, albeit a specific
class of one in the narrow definition (I think people are using a
more general definition covering any drift in optimisation target(s)
here). Deterministic yes because nondeterministic (by which I assume
you mean 'generate non-transitive preference orderings' or 'affected
by seemingly irrelevant AI internal state') goals are a bad thing,
as you yourself seem to claim when explaining your own self-modification
desires. Nonlinearity, no, see above. Utility functions are often
mentioned, but they have no prohibition against including complex
conditional logic and/or nonlinear functions.
> And why are the same people, rather patronizingly, I must say,
> treating me as if I am too dumb to understand what a goal system is?

A lot of AI researchers are mired in irrelevant fluff which they
believe to be vitally important. To be honest, it looks like you may
be in this situation, because you are stressing things that many here
see as irrelevancies (since they can be compressed/abstracted away
with little loss of predictive power) that a seed AI would rapidly
discard. Feel free to try to prove otherwise.

> I keep trying to talk about motivational systems and goal hierarchies
> with tangled loops in them [what tangled loop? the one that occurs
> when the system realises that it can swap motivational modules in
> and out, and that this swapping process could have profound
> consequences for the entire observable universe]

Yes, this is self-modification. We have all been talking about the
consequences of self-modification for years. You will note that 'loop'
implies goals modifying goals, or more generally decision functions
modifying decision functions, possibly through some variable number
of intermediate layers. Your argument is quite the opposite; you
claim that new goals can come from somewhere other than existing
goals. What this other mechanism is is something you have not
clearly stated.

> If you really insist on characterizing it as "my" type of AGI vs
> everyone else's type of AGI, that is fine: but I am talking about a
> more general type of AGI, as I have been [ranting] on about in this
> message.

...which the people 'patronising' you consider to be the result of
layering some pointless obfuscation on a corresponding rational,
transparent AGI design.
 * Michael Wilson

To help you stay safe and secure online, we've developed the all new Yahoo! Security Centre.

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:51 MDT