AGI motivations

From: Michael Wilson (
Date: Mon Oct 24 2005 - 12:13:58 MDT

Sorry about the delay, slight technical problem with the SL4 server's
post filtering.

Michael Vassar wrote:
>> Yes, uploads occupy a small region of cognitive architecture space
>> within a larger region of 'human-like AGI designs'... We cannot hit
>> the larger safer region reliably by creating an AGI from scratch.
> That assertion appears plausible but un substantiated to me. The
> understanding of human brain function required to build a relatively
> safe human-like AGI might be only trivially greater than that
> required to create an upload,

In the beginning, all things seemed possible, though few things seemed
certain. If you don't take a strong position on how the human brain
is organised, what the minimal functional components on an AGI are,
the basic principles of self-modifying goal systems etc etc then the
answer to all of these questions is 'maybe' rather than 'yes' or 'no'.
However you can't actually /do/ anything with this level of knowledge,
and if you simply pick something that looks desirable and start
working on a way to implement it you may well learn things that
invalidate the original desirability or even plausibility assessment.
This is what happened to the SIAI; Eliezer started researching general
AI, in working on that discovered that actually you needed to build an
FAI, and then in working on /that/ discovered that you can't do so in
an ad-hoc fashion.

You keep saying 'maybe we can do this, maybe we can do that'. I'm
using the knowledge of the domain I've acquired so far to predict
(with a reasonable degree of confidence) that some of these things are
practically impossible and others are a just a really bad idea. I'll
keep trying, but to be honest I don't think I'm going to have a lot
of success explaining my reasoning; there's just too much critical,
non-intuitive cognitive content in the model to squeeze into a few
emails when the reader doesn't share the background and premises. My
actual arguments are almost beside the point though; the real point
is that you can't reasonably say what the best course of action is
until you've seriously tried to solve the problem. Without a detailed
model of the problem space and some candidate solutions, it's simply
not possible to conclude anything useful.

> It may be much simpler to make a mind that will reliably not attempt
> to transcend than to build one that can transcend safely.

With no other constraints, I'd agree. But as you pointed out, the more
you try and make an AGI /useful/, particularly to FAI projects, the
harder this gets.

> One way to make such a mind is to upload the right person.

Which I'd still be nervous about, since we have no idea how a human
mind would react to the situation and if you get it wrong you've
given that human a good shot at imposing their will on everyone else.
Uploading is a better idea than trying to build 'human-like' AIs,
but brain-computer interfacing is /probably/ a safer option still. I
don't know enough to say which is harder; uploading is more of an
engineering challenge while BCI is more of a software challenge.

> It may be that building a large number of moderately different
> neuromorphic AIs (possibly based on medium res scans of particular
> brains, scans inadequate for uploading, followed by repair via
> software clean-up) in AI boxes and testing them under a limited
> range of conditions similar to what they will actually face in the
> world is easier than uploading a particular person.

Obviously that has moral implications, possibly mitigated by the fact
that you can restore the saved states after you've built the FAI, but
that's not a reason to reject it. The usual problems with AI boxes
apply; the risk may be reduced by the 'neuromorphic' architecture,
but if Eliezer can break out of a box (albeit simulated) it seems
unwise to bet on even a 'human-level' AGI staying in a box. My primary
objection to this though is that it seems like a massive waste of time
and effort; you could spend decades and billions of dollars on this
for a pretty minimal advantage in solving the real FAI problem.

> We know some ways for reliably hitting them, such as "don't
> implement transhuman capabilities".

To me this statement seems roughly equivalent to 'please implement
a complete clone of the Unix operating system, but make sure that
dictators won't be able to run nuclear weapon simulations on it'.
You're trying to implement a very high-level constraint directly
in code. Either you'll build token adversarial mechanisms that will
trivially be bypassed with a few days of hacking effort, or you're
crippling the system so badly it will be useless for benign
purposes as well. The normal solution would be 'make sure that the
AGI doesn't /want/ to acquire transhuman capabilities', but unless
you use an upload (in which case you're trusting a human thrust
into a completely novel psychological situation, and that's if you
get everything right) your 'neuromorphic' specification rules out
being able to impose strong (in the formal sense) constraints on
the goal system.

> Another may be to build an AGI that is such a person. How do you
> know what they want? Ask them. Detecting that a simulation is
> lying with high reliability, and detecting its emotions, should
> not be difficult.

Ok, I can't see a way to build a 'neuromorphic AGI' where you
can reliably detect lying. /Humans/ seem to be able to learn to
fool lie detectors pretty well (I don't know if this extends to
fMRI scanning, but it wouldn't surprise me, as the best
techniques amount to deliberately laying down false memories).
If you disagree, then specify the details of your neuromorphic
AGI architecture and specify how your lie detector works.
> >Indeed, if someone did manage this, my existing model of AI
> >development would be shown to be seriously broken.
> I suspect that it may be. Since you haven't shared your model I
> have no way to evaluate it, but a priori, given that most models
> of AI development, ncluding most models held by certified geniuses,
> are broken, I assume yours is too.

That's a fair assumption if you're talking about my best guess on
how to build an AGI; as I often say myself the prior for anyone
getting it right is minuscule. However that wasn't what I was
referring to; I meant my predictive model of /AGI projects/, in
the sense of understanding past failures and working out what
pitfalls current projects are most likely to get stuck in. This
is what I'm using when I say that attempts to closely mimic a
human brain which are nonetheless too impatient to develop full
uploading will at best fail and at worst create a UFAI.

> I'd be happy to work with you on improving it, and that seems
> to me to be the sort of thing this site is for,

As you may be aware, the Singularity Institute dropped the 'open
source' model of AGI development in mid-2000, and much as I'm in
favour of collaboration (and wish the SL4 list could discuss AGI
design more) I must regretfully agree with the reasoning behind
this decision.

> but destiny-star may be more urgent.

If you're referring to my former start-up company, it ceased
operations in January 2004, though I kept it around to support
my full-time AI research. Six months back I started a new
company 'Bitphase AI Ltd' (who doesn't have one these days? :) )
as a commercialisation vehicle for my current work.

> It's best to predict well enough to create, then stop predicting
> and create. Trouble is, it's hard to know when your predictions
> are good enough.

True. In this domain the human tendency to proceed on the basis
of inadequate prediction is a serious existential threat, which is
why Eliezer should be excused for any apparent excess of zeal in
insisting on having the very best FAI theory we can construct
before proceeding (subject to the fact that we're rapidly running
out of time).
>>> There should be ways of confidently predicting that a given machine
>>> does not have any transhuman capabilities other than a small set of
>>> specified ones which are not sufficient for transhuman persuasion or
>>> transhuman engineering.
>> Why 'should' there be an easy way to do this? In my experience predicting
>> what capabilities a usefully general design will actually have is pretty
>> hard, whether you're trying to prove positives or negatives.
> We do it all the time with actual humans.

Erm, humans can't have transhuman capabilities /by definition/, so I'm
not sure what your point is. We've certainly already gone over the
foolishness of generalising from humans to a de novo AI with a design
that looks kinda like how you think the brain might work.

> For a chunk of AI design space larger than "uploads" AGIs are just
> humans.

Again, this relies on the team doing a massive and probably infeasible
amount of work to create a perfectly accurate simulation, and if they
did what would be the point? We already have plenty of humans.

> Whatever advantages they have will only be those you have given them,
> probably including speed and moderately fine-grained self-awareness
> (or you having moderately fine-grained awareness of them).

/You cannot predict the high-level consequences of giving people new
low-level capabilities/. It's impossible to fully predict this with a
transparent design, never mind an opaque 'neuromorphic' one. This is
the same kind of problem as trying to predict the social consequences
of a radical new technology, but ten times harder (at least) because
your built in human-modelling intuitions have gone from making helpful
suggestions to telling you subtle lies.

> Predicting approximately what new capabilities a human will have
> when you make a small change to their neurological hardware can be
> difficult or easy depending on how well you understand what you are
> doing,

This is another of those statements that sounds fine in isolation.
The key qualifier determining the practicality of the idea is how
difficult it is for humans to acquire the 'understanding of what
they are doing' (and from a Singularity strategy point of view how
likely it is that they will do so before actually building the AGI;
project deadlines have a way of trampling on abstract concerns).
What understanding I have of human brain simulation suggests that
acquiring the desired level of understanding will be really, really
hard, and probably intractable without a lot of sophisticated
tools that might need infrahuman AGI themselves. You can't gloss
over comprehension challenges like this the way you can gloss over
engineering challenges by saying 'well, the laws of physics permit

> but small changes, that is, changes of magnitude comparable to the
> range of variation among the human baseline population will never
> create large and novel transhuman abilities, but lots

This only holds if you vary the simulation parameters along dimensions
that mimic human genetic variability. There are lots of ways to
slightly alter a brain simulator that are very hard or even literally
impossible for natural selection to replicate, most of which will
probably produce very-hard-to-predict results and any of which could
produce transhuman abilities (possibly coupled with motivation drift)
that take us back to an Unfriendly seed AI situation. This is one of
the few areas where the 'Complex Systems' crowd actually have a
point, and that point is that it's just not safe to play with this

>>> It should also be possible to ensure a human-like enough goal system
>>> that you can understand its motivation prior to recursive
>>> self-improvement.
>> Where is this 'should' coming from?
> The "should" comes from the fact that the whole venture of attempting to
> build an organization to create a Friendly AI presupposed the solution to
> the problem of friendly human assistant has been solved

This statement is simply wrong. It is perfectly possible to produce a
solution to the FAI specifying an arbitrary non-human-like seed AI
architecture without ever having the ability to specify a 'friendly
neuromorphic AI'; the latter requires the solution of numerous hard
challenges that aren't required for the former. This is the strategy
that the SIAI is taking, for all the reasons stated above and more.

Aside from that, the fact that we want to solve the problem and are
trying hard to do so does not mean that it 'should' be solvable. The
universes is not required to accommodate our notions of desirability.

> What is the difference between trusting a human derived AI and
> trusting a human. Either can be understood equally well by reading
> what they say, talking to them, etc.

Humans deceive humans on a massive scale every day, sometimes hiding
the most malign intentions and actions for decades, and that's with
our innate ability to model other (human) minds working at peak
effectiveness, something which drops off sharply as the cognitive
architecture of the target begins to deviate from our own. We have
a hard enough time predicting how humans from other cultures will
behave, never mind minds with a different physical substrate,
sensorium and cognitive architecture (even if their motivational
mechanisms and broad capabilities are human-like). Regardless, I for
one would like a solution more reliable than the baseline human
ability to detect deceit.

>> Trying to prevent an AGI from self-modifying is the classic 'adversarial
>> swamp' situation that CFAI correctly characterises as hopeless.
> You shouldn't try to prevent a non-human AGI from self-modifying, but
> adversarial swamps involving humans are NOT intractible, not to mention
> that you can still influence the human's preferences a LOT with simulate
> chemistry and total control of their environment.

I rather doubt that messing with the uploaded human's simulated brain
chemistry mind is going to engender trust; in fact it seems more likely
to me that you'd introduce subtle psychosis (though the actual answer
to this requires a detailed model of human personality fragility under
fine brain modification, which we don't have and won't have without a
lot of research).

> I don't think this is as severe as the adversarial situation currently
> existing between SIAI and other AI development teams from whom SIAI
> withholds information.

Firstly, you've jumped into an entirely different argument. Information
sharing between teams of human developers has little or nothing to do
with a programmer trying to frustrate an AGI's attempts to self-modify.
Secondly, the majority of AGI projects are withholding key information.
Many have published a lot less information than the SIAI has. I don't
see how you can reasonably criticise the SIAI for this without
criticising virtually every AGI researcher on this mailing list.

>> Any single point of failure in your technical isolation or human
>> factors will probably lead to seed AI. The task is more or less
>> impossible even given perfect understanding, and perfect
>> understanding is pretty unlikely to be present.
> Huh? That's like saying that workplace drug policies are fundamentally
> impossible even given an unlimited selection of workers, perfect worker
> monitoring, and drugs of unknown effect some of which end the world when
> one person takes them.

My point is that we can't (or at least, probably won't) have 'perfect
worker monitoring', nor do we have a free selection of workers given
the effort needed to make an upload or new AI and the need for them
to be useful. The actual situation would be much more like /real
world/ workplace drug policy enforcement, but still with your criteria
that some drugs end the world when only one person takes them.

>> A 'non-transparent' i.e. 'opaque' AGI is one that you /can't/ see
>> the thoughts of, only at best high level and fakeable abstractions.
> You *can* see the non-transparent thoughts of a human to a
> significant degree, especially with neural scanning. With perfect
> scanning you could do much better.

Ok, right now the best we can do is classify the rough kind of mental
activity and possible under some circumstances the kinds of thing a
person might be thinking about. We have neither repeatability, detail
nor context, and this is with subjects who are actively trying to
help rather than actively try to deceive. I admit that the degree to
which this can improve given better scanning alone is not well
established. I am simply inclined to believe various domain experts
who say that it is /really hard/ and note that this tallies with my
expectations about trying to data-mine extremely complex, noisy and
distributed causal networks without a decent model of the high-level
processing implemented.

> I strongly disagree. Reading thoughts is very difficult. Reading
> emotions is not nearly so difficult,

Firstly, this assumes perfect uploading with no small yet significant
factors that throw off our human-derived models. Secondly, we have
no experience trying to do this under adversarial conditions, and
it's unreliable even with co-operative subjects. Thirdly normal
humans can already be pretty damn good at manipulating their own
emotions. Finally there's no reason to assume that an Unfriendly
neuromorphic AGI is going to get highly emotional about it; this
list has already seen quite a bit of argument for the notion that
unemotionally selfish sociopathic humans are widespread in society.

>> I'll grant you that, but how is it going to be useful for FAI
>> design if it doesn't know about these things?
> I can think of several ways. Admittedly the use would be more
> limited.

I'd certainly be interested to hear them, given that I'm currently
researching the area of 'tools to help with FAI design' myself.

>> How do you propose to stop anyone from teaching a 'commercially
>> available human-like AGI' these skills?
> Probably not possible at that point, but at that point you have
> to really really hurry anyway so some safety needs may be
> sacrificed. Actually, probably possible but at immense cost,
> and probably not worth preparing for.

See previous discussion on why government regulation of AI (a)
isn't going to happen before it's too late and (b) would be worse
than useless even if it did.

 * Michael Wilson

Yahoo! Messenger - NEW crystal clear PC to PC calling worldwide with voicemail

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:52 MDT