Re: Robot that thinks like a human

From: Michael Wilson (
Date: Thu May 19 2005 - 07:24:38 MDT

Ben Goertzel wrote:
> I have a great deal of doubt that it's possible for anyone to
> achieve a good understanding of AGI Friendliness prior to building
> and experimenting with some AGI's.

You don't appear to believe that it's possible to achieve a good
understanding of how AGIs work full stop prior to actually building
them. This is the fundamental reason why I don't think any of your
designs will succeed; you're trying things that sound as if they
might work, but the design space is too large and the zone of success
too small for even educated guessing to work. And at five years or so
per 'project generation' (global architecture), human-driven
incremental trial and error isn't going to work any time soon.

The SIAI approach relies on achieving complete understanding of how
the design works before building it. This means specifying global
behaviour first and breaking that down into progressively more local
behaviour until you have code. My own project has skipped ahead a bit
and is applying that approach to a limited AI domain which is

> So far none of the ideas published online by the SIAI staff have done
> anything to assuage this doubt.

True. We say that it looks like it is possible, you say that it looks
like it isn't possible, neither of us have published any formal
reasoning to support our position. We think you're rationalising, you
think we're indulding in wishful thinking. For now we can only keep
working towards proof that resolves the issue one way or the other.

> Sure, you can argue that it's better to spend 10-20 years trying to
> construct theoretical foundations of Friendly AGI in the absence of
> any AGI systems to play with. But the risk then is that in the interim
> someone else who's less conservative is going to build a nasty AI to
> ensure their own world domination.

Frankly that's highly unlikely. Reliable world domination is of the same
structural difficultly as Friendliness; it's perhaps a little easier to
specify what you want, but no easier to get an AGI to do it. Even the
people who think that AGIs will automatically have self-centered human
like goal systems should agree with this. Anyone foolish enough to try
and take over the world using AGI, and who manages to beat the very
harsh negative prior for AGI project success, will still almost certainly
fail (we'd say by destroying the world, people with anthropomorphic views
of AI would say because the AGI revolts and rules the world itself or
discovers objective morality and becomes nice, but still failure).

> IMO a more productive direction is think about how to design an AGI
> that will teach us a lot about AGI and Friendly AGI, but won't have
> much potential of hard takeoff.

You don't need to build a whole AGI for that. Any algorithms or dynamics
of interest can be investigated by a limited prototype. The results of
these experiments can be fed back into your overall model of how the
design will perform. AGI is hard to modularise, but if your design
requires a random-access interaction pattern over every single functional
component before it displays recognisable behaviour then you are on a
wild goose chase.

> I think this is much more promising than trying to make a powerful
> theory of Friendly AI based on a purely theoretical rathern than
> empirical approach.

Well, lets face it, experimenting is more fun, less frustrating and
potentially money-spinning. I've previously detailed the reasons why
experimenting with proto AGIs (particularly those lacking takeoff
protection) is a bad idea at some length, so I won't do so again now.

> The Novamente project seeks to build a benevolent, superhuman AGI

Ben, you started off trying to build an AGI with the assumption that it
would automatically be Friendly, or that at most it would take a good
'upbringing' to make it Friendly. So did Eliezer and by extension the
SIAI. Eliezer realised some time around 2001 that Friendliness is not
automatic, it's a very specific class of behaviours which will only be
achievable and stable in a very specific class of cognitive
architectures. The SIAI essentially threw everything away and started
from scratch, because 'AGI that must be Friendly' is a very different
spec from 'self-improving AGI'. This required a different design
approach, which we initially adopted with trepidation and resignation
because formal methods had a pretty bad track record in GOFAI. As it
turned out the problem wasn't formal methods, the problem was GOFAI
foolishness giving them a bad name, and that design approach was
actually far preferable even without the Friendliness constraint.

The fundamental problem with Novamente is that you didn't reboot the
project when you realised that Friendliness was both hard and
essential. You're still thinking in terms of 'we're going to build
an AGI, which we will find a way to make Friendly'. I think it may
actually take that radical statement of 'we /must/ find a way to
reliably predict how the AGI will behave before building it' to force
people to abandon probabilistic, emergentist and other methods that
are essentially guesswork.

> (I'm not using the word Friendly because in fact I'm not entirely
> sure what Eli means by that defined term these days).

Regardless of whether you agree with Eliezer's rather contraversial
ideas about Friendliness content, the problem of maintaining stable
optimisation targets, or more generally how to translate desired
behaviour into decision functions that are stable under reflection,
is one all designs that claim to be Friendly must solve.

> We are committed not to create an AGI that appears likely capable
> of hard takeoff unless

Note use of the word 'appears'. Without a predictive model of the
system's dynamics you are making a personal intuitive judgement, with
no representative experience to calibrate against and under strong
pressure to believe that you don't need to delay your schedule and
that you're not an existential risk. 'Appears' is utterly unreliable.
I don't think your AGI design has significent takeoff risk either,
but I still think you should implement every practical safety
mechanism (though again without the predictive model, you can't be
sure how effective they will be). As Eliezer says, if nothing else
it will get you in the habit.

> it seems highly clear that this AGI will be benevolent.

I could dissect that 'seems' as well, but that would be beating a
dead horse. It seems harsh to criticise you so much Ben when you're
way ahead of almost all of your contemporaries in realising that
Friendliness is important and difficult, but unfortunately you're
still only a small fraction of the way towards the elements needed
for a realistic chance of success.

> We are not committed to avoid building *any* AGI until we have a
> comprehensive theory of Friendliness/benevolence, because
> a) we think such a theory will come only from experimenting with
> appropriately constructed AGI's

I don't think you can actually get such a theory from experimenting
with AGIs unless you know exactly what you're looking for. Inventing
a theory to explain the behaviour shown in some set of simple
experiments will probably be simultaneously easier yet result in a
theory will a lot of cruft compared to a proper theory of the
dynamics of causally clean goal systems. If your AGI doesn't have
a causally clean goal system then it's pretty much a write off in
terms of our ability to predict the results of self-modification.

> So anyway, it is just not true that the SIAI is the only group
> seeking to build a demonstrably/arguably benevolent AGI system.

True, but that's not what I said;

>> The only realistic way for humanity to win is for the AGI race
>> to be won by a project that explicitly sets out to build an
>> AGI that can be /proven/ to be Friendly (to a high degree of
>> confidence, prior to actually building it).

Predicting system behaviour via formal probabilistic reasoning from
the spec is /not/ the same as being able to 'argue' that it will
behave in certain ways, or simply claiming that your demo version's
behaviour is bound to remain stable under self-modification.

Virtually no-one wants to destory the world on purpose, and most AI
researchers want to make the world a better place. The problem
isn't one of desires, but of vision (to see the scope of possible
consequences) and technique.

 * Michael Wilson

How much free photo storage do you get? Store your holiday
snaps for FREE with Yahoo! Photos

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:51 MDT