Re: FAI and SSSM

From: Bill Hibbard (
Date: Fri Dec 13 2002 - 11:57:32 MST

Hi Eliezer,

> What about processes that construct themselves? Does it make sense to
> describe the child of the child of the child of the child of the mind that
> humans originally built as an "artifact constructed by humans"? Is it
> useful to describe it so, when it shares none of the characteristics that
> we presently attach to "machines"? Call it a mind, or better yet an
> entity; we might be wrong on both counts but at least we won't be quite as
> wrong as if we use the word "machine".
> Yes, I'm arguing over the definition of a word; deliberately so because I
> think that people expect certain characteristics to hold true of
> "machines", and that these characteristics don't hold true of SIs
> (superintelligences). I would expect an originally human SI and an
> originally human-built SI to have more in common with each other than
> either would have in common with a modern-day human or a human-equivalent
> AI, and I would expect even a half-grown AI to have almost nothing in
> common with the physical objects we categorize as "machines".

I get at the same issue in my book by referring to them as
both machines and gods.

> Perhaps the complex behavior of planning is emergent in the simple
> behavior of reinforcement, as well as the simple behavior of reinforcement
> being a special case of the complex behavior of planning. I don't think
> so, but then I haven't tried to figure out how to do it, so I wouldn't
> know whether it's possible.

Based partly on Biology of Mind by Deric Bownds, I would say
that brains evolved in these steps:

1. sensors -> nerve cells -> actions

2. learning responses to actions, including the beginnings of
values such as "eating is good"

3. simulation (i.e., brains processing experiences that are
not actually occuring) in order to begin solving the temporal
credit assignment problem

4. increasingly sophisticated simulation (planning, simulating
brains of other animals) and values (social values for teamwork)

To see how planning fits into learning, consider that when
humans confront novel situation they consciously plan their
behavior, based on simulated scenarios. As they repeat the
situation and it is less novel, those planned behaviors
become unconscious. Furthermore, those unconscious behaviors
become part of the repitoire for future planning.

This relation between planning and learning is illustrated
by the development of a beginning chess player into a chess
master. A beginner's plans may include as many alternatives
as a master's, but the master's plans are in terms of primitive
units learned through lots of previous plans and reinforcement.

Planning and reasonning must be grounded in learning, in much
the way that symbols must be grounded in sensory experience.
Furthermore, I would say that goals are grounded in values
(although I admit this last statement depends on how these
terms are defined).

> But human evolution includes specific selection pressures on goals, apart
> from selection pressures on reinforcement. Imperfectly deceptive social
> organisms that argue linguistically about each other's motives in adaptive
> political contexts develop altruistic motives and rationalizations from
> altruistic motives to selfish actions; if supra-ancestral increase in
> intelligence or knowledge overcomes the force of rationalization, you are
> then left with a genuine altruist. How would a robust implementation of
> reinforcement learning duplicate the moral and metamoral adaptations which
> are the result of highly specific selection pressures in an ancestral
> environment not shared by AIs? You can transfer moral complexity directly
> rather than trying to reduplicate its evolutionary causation in humans,
> but you do have to transfer that complexity - it is not emergent just from
> reinforcement.

Human moral and ethical systems are complex indeed, but are
all ultimately grounded in human learning values (emotions),
modifed by social interaction (itself driven by human values
favoring social interaction).

Machines will evolve ethical and moral systems based on their
own learning values, and especially their social interaction
with humans if their values are for human happiness.

By the way, Pinker does a great job of analyzing human values
in How the Mind Works.

> I confess that I don't see how this changes anything at all. I assumed a
> simulation model that is not only used for temporal credit assignment, but
> which allows for imagination of novel behaviors whose desirability is
> determined by the match of their extrapolated effects against previously
> reinforced goal patterns. Without this ability, no reinforcement-based
> system would ever be capable of carrying out complex creative actions such
> as computer programming - when I write a program, I am reasoning from
> abstract, high-level design goals to novel concrete code, not just
> implementing coding behaviors that have been previously reinforced.

A robust solution of the temporal credit assignment problem, one
that finds behaviors to optimize values, includes imagining possible

> When I say "simple reinforcement system", I mean "a lot simpler than a
> human or a Friendly AI"; "simple" does include full modeling/simulation
> capabilities, for both credit assignment and imagination of novel
> behaviors. Maybe calling it a "flat" reinforcement system would be
> better. The problem with a flat reinforcement system is that it
> flash-freezes itself the moment it becomes capable of self-modification.

The system will predict the ways contemplated changes affect its
values, and refrain from changes that negatively affect them. In
this way the system and its evolution (self-modification) are
locked into serving human happiness.

It is interesting that Ben has raised exactly the opposite
objection: that values might drift as the system evolves.

I think "happiness of all humans" are the right values, since
they cause the machines to inherit human values. It does let
values drift but under human control. Of course, there is no
way to prove that the value for the happiness of all humans
won't drift, but on the other hand ask any sane mother if she
would modify her brain so she no longer loved her children.

> Originally, you built the system such that it contained certain internal
> functional modules which modified goal patterns conditional on external
> sensory events. And from its goals at any given point, the system judges
> the desirability of future states of the universe, and hence the
> desirability of actions leading to those future states.
> Now imagine this system looking at the fact that it possesses
> reinforcement modules, and considering the desirability of actions which
> remove those modules.

Reinforcement learning is fundamental to the way brains work,
rather than being an optional module. As long as brains cannot
perfectly predict the universe, they will need reinforcement

> Any internal system, whose effect is to change the
> cognitive pattern against which imagined future events are matched to
> determine their desirability, is automatically undesirable; if the AI's
> future pattern changes, then the AI will take actions which result in an
> inferior match of those futures against the current pattern governing
> actions. To protect the goals currently governing actions (including
> self-modifying actions), the system will remove any internal functionality
> whose effect is to modify the top layer of its goal system.
> This action feels intuitively wrong to a human because humans have extra
> complexity in the goal system, which for purposes of Friendly AI we can
> think of as humans treating moral arguments as having the semantics of
> probabilistic statements about external referents. See the appropriate
> sections of "Creating Friendly AI" for more information.
> How do you think temporal credit assignment would change this? It doesn't
> seem relevant.

By making machine values depend on human happiness, evolution of
goals remains under human control. The affect of a solution to
the temporal credit assignment problem is merely to enable the
system to make predictions about the affect of its contemplated
behaviors on its values.

> >>Finally, you're asking for too little - your proposal seems like a defense
> >>against fears of AI, rather than asking how far we can take supermorality
> >>once minds are freed from the constraints of evolutionary design. This
> >>isn't a challenge that can be solved through a defensive posture - you
> >>have to step forward as far as you can.
> >
> > Not at all. Reinforcement is a two-way street, including both
> > negative (what you call defensive)
> No, that wasn't what I meant by "defensive" at all. I was referring to
> human attitudes about futurism.
> > and positive reinforcement.
> > My book includes a vivid description of the sort of heaven on
> > earth that super-intelligent machines will create for humans,
> As I believe your book observes, such vivid descriptions are pointless
> because we aren't smart enough to get the description right. For example,
> your book refers to automated farms and factories rather than
> nanotechnology and uploading. This, of course, does not mean that
> nanotechnology is the correct description; only that we can already be
> pretty sure that farming and factories are destined for the junk-heap of
> history.

Good point. Farms and factories as we know them will disappear.
So I'm using "farms and factories" as generic terms for facilities
for producing food and artifacts.

> Recommended book: "Permutation City", Greg Egan.
> Recommending online reading:
> > assuming that they learn behaviors based on values of human
> > happiness, and assuming that they solve the temporal credit
> > assignment problem so they can reason about long term happiness.
> What if someone has goals beyond happiness? Many philosophies involve
> greater complexity than that.

Acheiving goals makes people happy.

> My current best understanding of morality is that "good" consists of
> people getting what they want, defined however they choose to define it.
> But I'm not infallible, so that understanding is itself subject to change.
> What happens if it turns out that "happiness" isn't what you really
> wanted?

Getting what I want makes me happy.

> How does your design recover from philosophical errors by the
> programmers?

This is a major issue for any design. I address it in two sections
of my book: Failures (in the chapter The Ultimate Engineering
Challenge) and Mental Illness (in the chapter Good God, Bad God).

Assuming that through the public policy debate we can avoid
super-intelligent killing machines, and super-intelligent
Enron Corporations, then we still face the problem of a
machine that values the happiness of all humans malfunctioning.
The real danger comes when the machine advances to the point
where it is capable of intimately knowing billions of humans,
so that it can dominate any public policy debate. We need to
approach that point slowly and carefully.

> I also didn't realize that your book to which you referred was available
> online. I've now read it. Don't suppose you could return the favor and
> check out "Creating Friendly AI", if you haven't done so already?

Its an early draft, with a different title. The final print
edition includes a reference to your work and lots of other
information not in the on-line draft.


By the way, I recognize that these future issues are speculative
and reasonable people can disagree. Your mailing list, and Ben's
AGI mailing list, are very valuable resources.


This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:41 MDT