Re: FAI and SSSM

From: Bill Hibbard (
Date: Thu Dec 12 2002 - 05:19:44 MST

Hi Eliezer,

> Bill Hibbard wrote:
> >
> > With super-intelligent machines, the key to human safety is
> > in controlling the values that reinforce learning of
> > intelligent behaviors. In machines,
> Machines? Something millions of times smarter than a human cannot be
> thought of as a "machine". Such entities, even if they are incarnated as
> physical processes, will not be physical processes that share the
> stereotypical characteristics of those physical processes we now cluster
> as "biological" or "mechanical".

You've just arguing over the definition of words. I am
using "machines" to mean artifacts constructed by humans.
If you read my book you'll see I imagine super-intelligent
machines as quite different from any other machines humans
have built.

> > we can design them so
> > their behaviors are positively reinforced by human happiness
> > and negatively reinforced by human unhappiness.
> A Friendly seed AI design, a la:
> doesn't have positive reinforcement or negative reinforcement, not the way
> you're describing them, at any rate. This makes the implementation of
> your proposal somewhat difficult.

I am well aware of the relation between your approach based
on planning behavior from goals, and my approach based on
values for reinforcement learning.

A robust implementation of reinforcement learning must solve
the temporal credit assignment problem, which requires a
simulation model of the world. This simulation model is the
basis of reasoning based on goals. Planning and goal-based
reasoning are emergent behaviors of a robust implementation
of reinforcement learning.

> Positive reinforcement and negative reinforcement are cognitive systems
> that evolved in the absence of deliberative intelligence, via an
> evolutionary incremental crawl up adaptive pathways rather than high-level
> design. A simple predictive goal system with Bayesian learning and
> Bayesian decisions emergently exhibits most of the functionality that in
> evolved organisms is implemented by separate subsystems for pain and
> pleasure. See:
> A simple goal system that runs on positive reinforcement and negative
> reinforcement would almost instantly short out once it had the ability to
> modify itself. The systems that implement positive and negative
> reinforcement of goals would automatically be regarded as undesirable,
> since the only possible effect of their functioning is to make *current*
> goals less likely to be achieved, and the current goals at any given point
> are what would determine the perceived desirability of self-modifying
> actions such as "delete the reinforcement system". A Friendly AI design
> needs to be stable even given full self-modification.

I think you may be assuming a non-robust implementation of
reinforcement learning that does not use a simulation model
to solve the temporal credit assignment problem.

> Finally, you're asking for too little - your proposal seems like a defense
> against fears of AI, rather than asking how far we can take supermorality
> once minds are freed from the constraints of evolutionary design. This
> isn't a challenge that can be solved through a defensive posture - you
> have to step forward as far as you can.

Not at all. Reinforcement is a two-way street, including both
negative (what you call defensive) and positive reinforcement.
My book includes a vivid description of the sort of heaven on
earth that super-intelligent machines will create for humans,
assuming that they learn behaviors based on values of human
happiness, and assuming that they solve the temporal credit
assignment problem so they can reason about long term happiness.

> > Behaviors are reinforced by much different values in human
> > brains. Human values are mostly self-interest. As social
> > animals humans have some more altruistic values, but these
> > mostly depend on social pressure. Very powerful humans can
> > transcend social pressure and revert to their selfish values,
> > hence the maxim that power corrupts and absolute power
> > corrupts absolutely.
> I strongly recommend that you read Steven Pinker's "The Blank Slate".
> You're arguing from a model of psychology which has today become known as
> the "Standard Social Sciences Model", and which has since been disproven
> and discarded. Human cognition, including human altruism, is far more
> complex and includes far more innate complexity than the behaviorists
> believed.

I am quite familiar with Pinker's ideas. He gave a great talk
on "The Blank Slate" here in Wisconsin last year (I was lucky
to get a seat, the room was packed). In fact, my ideas about
human selfishness and altruism are largely based on Pinker's
How the Mind Works.

I think you are assuming I am a Skinner behaviorist because you
are thinking of reinforcement learning without a solution of
the temporal credit assignment problem.

> If you can't spare the effort for "The Blank Slate", . . .

That's kind of a cheap shot, Eliezer. I read voraciously. And
I code voraciously, which you'll see if you visit the URL in
my sig.

Bill Hibbard, SSEC, 1225 W. Dayton St., Madison, WI 53706 608-263-4427 fax: 608-263-6738

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:41 MDT