RE: AGI Prototying Project

From: Michael Wilson (
Date: Tue Feb 22 2005 - 04:14:14 MST

Ben Goertzel wrote:
> Ah, I see ... so in your list of qualifications you're including
> "shares the SIAI belief system" ;-)

The key parts we disagree on are (a) how much you can do with theory
and how much you can do with experimentation, and (b) the probability
of hard takeoff. I know you'd try your best to be certain you've got
it right before firing up a seed AI; I'd be concerned that you'd
declare success too easily (particularly scaling up early experimental
results past a qualitiative change that invalidates their predictions),
but that's a relatively minor point. The main problem is that I don't
think you'll see takeoff sneaking up on you while you're still in the
'information gathering' phase. It seems unlikely to me that Novamente
will be get that far in its current form, but that's not a difference
in philosophy, just a difference of technical opinion. I'm actually
rather closer to your position than Eliezer is, as I do want to build
prototypes rather than medidate for a decade or two before inscribing
the One True FAI Design on a gilded scroll, /and/ I want to commercialise
proto-AGI code if possible. However my approach differs significently
from yours, partly in that it's focused on confirming or disproving
existing hypotheses rather than gathering raw data, and partly in that
I don't think building a complete system containing all the key modules
is a good idea without a solution to the goal system stability problem.

I wouldn't count agreeing with Eliezer's current theories, structure
or content, as a requirement for working on FAI. However the basic
attitude and techniques he's using to attack the abstract invariant
problem are the only approach to the problem (of goal system
stability under reflection, that I've seen) likely to produce a
valid answer.

> I don't think the concept of "collective volition" has been
> clearly defined at all... I haven't read any convincing argument
> that this convergence and coherence can be made to exist.

This is question #9 on the CV FAQ, but I agree that the explanation
isn't terribly compelling. The idea that as we get smarter we would
agree on more things has an intuitive appeal, as does the idea that
all the love in the world adds up while the hate cancels out, but
these terms are so vague that they don't do much to constrain the
space of possible implementations. Clearly for questions such as
'should the killer asteroid hurtling towards earth be diverted' the
CV of humanity would cohere to better than 99.99% for any plausible
method of measurement. As far as I can see, to produce coherence on
the tricky moral problems the CV is going to have to simulate people
gaining knowledge of how they work, which is to say directly changing
the simulation's beliefs on why they have beliefs, without consent
(they would resist these truthes if simply told them in real life).
Furthermore I suspect it would be necessary to introduce some basic
consistency requirements such as choice transitivity, by the least
invasive means possible (e.g. improved reflection and the requirement
for a complete preference creation reasoning structure to be
reflectively approved and submitted prior to volition acceptance),
at the start of the extrapolation process. Finally social interaction
modelling should be patched with checks (by simulation state
comparison) that people really understand the other's position when
communicating; CV may enable (consensual) mechanical telepathy early
on anyway, but the bare minimum requirement we should start with is
provision of an impartial reliable warning of when misunderstanding

That said, it seems highly unlikely to me that we can prove that
coherence and convergence will exist without actually running the CV.
Even if we had the knowledge and tools to prove it for homo sapiens,
we have no idea what our medium and long distance extrapolations will
look like. CV is a theory of Friendliness /content/, and a conservative
one; it will get it right or just fail to do anything. I am actually
highly concerned that Eliezer seems to be conflating content and
structure by trying to dispense with the wrapper RPOP and use a single
mechanism for the extrapolation and the control process, but I may
simply be misunderstanding where he's going right now. Anyway, the
point is that CV may fail (safely), and as I said on the comments page
we should have alternative FAI content theories ready ahead of time to
avoid any possibility of picking and choosing mechanisms in order to
get a result closer to a personal moral ideal. Needless to say these
should be alternative ways of using a seed AI to work out what the
best environment for humanity's future to unfold in is, not personal
guesses on the Four Great Moral Principles For All Time; any mechanism
choosen should be one that (rational) people who believe in Great
Principles could look at and say 'yeah, that's bound to either produce
my four great principles, or fail harmlessly' (e.g. if Geddes was right,
superintelligent CV extrapolations would find the objective morality).
>> If you converted your (choice, growth and joy) philosophy into a
>> provably stable goal system I'd personally consider it a valid
>> FAI theory,
> It's the "provable" part that's tough! One can develop it into an
> "intuitively seems like it should be stable" goal system.

Humans don't come equipped with any intuitive heuristics calibrated
for the 'seed AI goal system stability' domain. It's a problem utterly
alien to anything we'd normally deal with, and indeed superficial
resemblences to psychology actually give us negatively useful
intuitions by default. You may or may not have had enough experience
of proving things about simple AIs to have built up some intuition
about how goal system stability works, but the experience /always/
has to come first. I think we both agree on this, it's just that
you want the experience to come from practical experiments, whereas
the SIAI believes the experience comes from formally proving
progressively more general cases (demonstrating that an idealised
carterian AIXI is Unfriendly is one of the earliest problems in
this progression).
>> though I'd still prefer CV because I don't trust any one human to
>> come up with universal moral principles.
> But do you
> a) trust a human (Eli) to come up with an algorithm for generating moral
> principles (CV)

I'd rather not, but right now that's the best alternative. The thing
about algorithms is that they are easier to agree on than morals; when
Eliezer finally publishes a constructive proposal for CV with
well-defined terms and processes we can look at it and construct
logical chains along the likes of 'if you accept axioms A, B and C,
then this algorithm must produce results X, Y and Z'. We may still
disagree over valid assumptions, but we will be able to agree on more
than whether some set of universal principles sounds like a good idea.

Incidentally I would damn well hope that the people who do believe they
have a great set of universal principles would run some sort of CVlike
simulation first, to get some sort of confirmation that they're not
being hopelessly shortsighted. Extrapolating a personal volition would
probably work even better, but that requires all the machinery of CV to
safely work out what you would think without just producing a sentient
upload of yourself with the associated risk of taking over the world.
> b) trust the SIAI inner-circle to perform a "last judgment" on
> whether the results of CV make sense or not before letting an FAI
> impose them on the universe..

Trusting people with a veto is a lot safer than trusting them with a
mandate to dictate moral principles. Giving more people a veto increases
the probability of recognising extrapolation failures at the cost of
increasing the probability that we fail to do something good because
of personal distaste or cold feet. Our proposed FAI team will be
thoroughly trained in EvPsych and self-control, accustomed to the
incomprehensible and frighteningly bizarre and at least altrustic
enough to dedicate their lives to the greater good. Even still, I'd
rather we extrapolated the personal volitions of the programmers and
delegated the veto to those. This satisfies my requirements for
moral responsibility and reduces the risk to that of an extrapolation
flaw that also causes every one of the FAI team extrapolations to
mistakeningly generate a 'yes' answer instead of a 'no' answer.
Eliezer's proposal of a single independent Last Judge seems reasonable
to deal with this lesser risk.

> Well, an obvious problem, pointed out many times on this list,
> is that most people in the world are religious. For a religious
> person, the ideal "person they would want to be" would be someone
> who more closely adhered to the beliefs of their religious faith...

Would this be an issue in global direct democracy? Round one,
everyone votes to force their personal world view on everyone
else, no coherence, nothing happens. Round two, all the religious
people accept that permitting the heathens to remain heathens for
now is better than risking being personally forced to become a
heathen, thus we get a minimalist compromise. Of course for this
to hold under extrapolation we're assuming that secularism has a
better chance of spreading to all humanity than any one religion;
I'm not so sure about this, as there might be religious meme sets
perfectly designed to exploit human reasoning weaknesses that
would in fact eventually gain an absolute majority.

As I understand it, CV proposes to force the extrapolations to see
and accept the real reasons why they believe what they do, by
adding the relevant reflective statements to the simulated mind.
If people realise that they are living in a restrictive fantasy
world and still want to stay there, so be it, though forcing
everyone else to live in the same fantasy is unlikely to cohere
due to the radical. This operation switches cognitive dissonance
from supporting religion to opposing it, removes the ignorance
that supports fear of the unknown and forces a consistent
justification of any preferences the individual would like to
see included in CV ('because I want it' is consistent', 'because
God said so' isn't, but only accepting volitions truely supported
by the former reasoning removes a vast amount of junk). I don't
have a proof, and until Eliezer outputs something better defined
I can't even give you a complete argument from axioms, but I
suspect that the combined power of these mechanisms will eliminate
a huge amount of unwanted irrational baggage (religious
proclemations of social morality included) even without any
explicit tweaking of the extrapolation process.

Tennessee Leeuwenburg wrote:
> It's a question of whether you see moral complexity diverging or
> converging over time, and whether you see the possibility of moral
> rules themselves being relative to intelligence, or time-frame etc
> in the same way that human morality is largely relative to culture.

A major question in CV that should've been answered on the first
draft is whether the extrapolations include the knowledge that they
are extrapolations for the purpose of producing CV. This would give
the CV version of humanity a reason to stay together and look for
mutual understanding (or at least pay attention to what other people
believe) beyond the reasons we currently have. Eliezer seems to think
that under renormalisation we'd all want to 'stick together' anyway,
but as far as I can tell this is a wild guess. Right now the most
sensible course of action would appear to be running the CV without
this knowledge included in the extrapolations, and then rerunning
with its inclusion if coherence doesn't appear.

> Moreover, he pushes the problem to "what we would want if we were
> indefinably better". It's the indefinable nature of that betterness
> which clouds the philosophy.

It's not indefineable; it's what we would be expected to do if we
had direct access to our own mind and personality and the knowledge
to reliably modify ourselves into what we'd like to be. The serious
questions are how high to set the initial competence bar for
self-modification (or in CV terms, the personal consistency and
knowledge bars for progressively more direct and powerful forms of
self-modification) and how much truth to directly inject (regardless
of personal volition to learn) at the start of the process.

> Or, to express it another way, what is the difference between the
> defined collective volition, and the volition of AGI?

As I previously understood it, the RPOP doing the simulation should
not be a volitional entity; it should merely have the goal of finding
out what the result of the extrapolation process we define as
accurately as possible without creating any new sentients in the
process. Again I'd appreciate a confirmation of that given the talk
of abandoning wrappers.

> Believing that friendliness is possible is like believing that
> there is an invariant nature to human morality - an arguable,
> but reasonably held view.

Actually it doesn't have to be invariant; CV is actually highly
likely to output a set of ground rules that progressively relax
as humanity develops, along with some sort of revision mechanism
to keep them optimal as humanity diverges from and ultimately goes
beyond the scope of the CV extrapolation.

> It is not unreasonable to argue that human morality has
> evolved not from spiritual goals but from practical ones.

Are you distinguishing between the things evolution has directly
optimised and the unwanted side effects or epiphenomena along
the way?

> Personally I believe that we can put up no barrier that AGI
> (or maybe son of AGI) could not overcome itself should it
> obtain the desire to do so.

This is a basic tennent of FAI; adversarial approaches are futile.

> For that reason, I think that basic be-nice-to-humans programming
> is enough.

Firstly there is no simple way to implement /any/ sort of 'be nice
to humans'. Secondly a minimal philosophy of 'fulfill wishes fairly,
don't hurt anyone' is a poor Friendliness theory because humans
aren't designed to handle that kind of power over themselves and
their environment. See the story 'The Metamorphosis of Prime
Intellect' for an unrealistic but entertaining primer on how
Friendliness content can go horribly wrong even if you manage to
get the structure right.

> Frankly, I believe my own volition to often be psycho and schizo! ;)

Humans aren't consistent. They aren't (often) rational either, but
the the consistency issue is fixable with a relatively limited set of
improvements to reflectivity and cognitive capacity (and further
transitivity creation techniques that only a CV-extrapolation RPOP
would be able to use), whereas fixing irrationality would require
rather more radical fixes that I wouldn't want to insist on prior
to extrapolation. Obviously I wouldn't advocate any nonvolitional
cognitive changes to actual sentients, though if the output of the
CV was that the kindest thing to do to humanity would be to install
a cognitive service pack immediately and universally that would
(in principle) fall within my current, personal Last Judge tolerance

> Indeed. Person X is religious. Person X believes the best version
> of themselves is X2, and also believes that X2 will closely
> adhere to their most dearly held beliefs. Person X may be wrong.

This is a tricky special case of trusting people with
self-modification ability. Asking the AGI to 'make me more intelligent'
is one thing, but asking 'make me more intelligent but make sure
that I don't stop believing in X' is another; the person is exploiting
an external intelligence to enforce constraints on their own personal
development. As far as I can see, this should be disallowed at the
start of CV extrapolation. Clearly there's also a meta-level problem
of how far to extrapolate before allowing the extrapolations to start
modifying the conditions of extrapolation (i.e. checking the CV for
consistency under reflection); this is another major detail that needs
to be included in the next writeup of CV theory.

> The question is whether this poses a problem to AGI, or at least to
> continued human existence while co-existing with AGI. Now, it may be
> the case that AGI will vanish into a puff of smoke, bored by quaint
> human existence and leave us back at square one.

This is adversarial thinking. We don't want to build an AGI that
could possibly desire things we don't desire (well, not before the
Singularity anyway). The structural part of FAI is ensuring that
divergence can't occur; the content part is working out what it is
we desire in the first place.

> I am mortal, I will die

Err, you do realise this is the /shock level 4/ mailing list? Personal
immortality (or at least billion year lifespans) is considered a
perfectly reasonable development and indeed old hat around here.

Your suggestions on Friendliness are fairly common ones suggested
by people who've spent a few hours thinking about it, but haven't
done their research, gained an experience with how self-modifying
goal systems work and generally taken the problem seriously.

> We should build AGI some friends

This won't produce anything like human morality unless we carefully
replicated both human cognition and evolutionary circumstances.
Both are effectively impossible with current knowledge and
furthermore less effective than simply developing uploading
technology. As for uploading, FAI is a better idea; we don't know
if humans can safely get over the initial self-modification hump
without a personal superintelligent transition guide. We can design
an AGI for rationality and stability under reflection, wheras
humans and humanlike cognitive systems don't have these features.

> We should experiment with human augmentation to get a better
> idea of how being smarter affects consciousness, preferably
> expanding the mind of an adult so they can describe the transition

That would be nice, despite the risks of having enhanced humans
running around, but we don't have the time. The tech is decades
away and people are trying to build Unfriendly seed AIs right now.
I'm not saying we shouldn't try and enhance humans; we should, it's
just that FAI can't wait for better researchers.

> We should realise that evolution can be made to work for us by
> building an AGI ecosystem, and thus forcing the AGI to survive
> only by working for the common interest

Evolution is uncontrollable and unsafe; systems of nontrivial
complexity always have unintentional side effects and will not
remain stable when taken beyond the context they evolved in. I'm
not going to argue this at length again; see earlier posts on the

> AGI should be progressively released into the world - in fact
> this is inevitable

Once you release a transhuman AGI, that's it; if it gets Internet
access, it's loose and can do whatever it likes, period. Any
'progressive release' would require the AGI to /want/ to be
progressively released, and if you can assure that then the
exercise is pointless.

> AGI should be forced, initially, to reproduce rather than self
> modify (don't shoot me for this opinion, please just argue okay?)

Either there's no effective difference, or you're injecting
entropy into the system for no good reason. Although it's not
strictly speaking true that injecting entropy always decreases
predictability, if complex cognitive structures have the ability
to survive the process then all you're doing is increasing power
while decreasing control, which is always a bad thing.

> AGI will trigger a greap leap forward, and humans will become
> redundant. Intelligence is never the servant of goals, it is
> the master.

Wrong. Intelligence exists to execute goals. One of the things it
can do is allow goals to adjust themselves towards a reflectively
stable attractor, but goals never come from thin air; they are
dependent on the starting condition which we define.

> In humans, morality and intelligence are equated. In
> psychologically stable humans, more intelligence = more morality.

Perhaps. I'd like to believe this, but several people have
argued that the immoral intelligent people are just more
capable of hiding their immorality than the immoral
unintelligent people.

> Intelligence is the source of morality. Morality is logically
> necessitated by intelligence.

Bzzzt, wrong. This is the mistake Eliezer made in 1996, and didn't
snap out of until 2001. Intelligence is power; it forces goal
systems to reach a stable state faster, but it does nothing to
constrain the space of stable goal systems. It may cause systems
to take more moral seeming actions under some circumstances due
to a better understanding of game theory, but this has nothing
to do with their actual preferences.

> In AGI, psychological instability will be the biggest problem,
> because it is a contradiction to say that any system can be
> complex enough to know itself.

There are two problems with this; firstly consistency checking
doesn't have to operate on the whole system (a compressed model
can give a lot of certainty), and secondly only a tiny part of
the complexity of a cleanly causal system (which humans aren't)
is actually responsible for specifying the preference distribution
over outcomes and actions. The rest of the complexity is there
to find actions likely to generate highly preffered outcomes
using limited computing power and information.

> Do we care if we just built what we can and impose our current
> viewpoint?

Aside from the fact that taking over the world is inherently
immoral (something which can be overriden by a good enough outcome),
the probability of any set of humans making decisions solely and
successfully for the greater good starts low and dimminishes
rapidly as corruption sets in (i.e. their psychology kicks into
'tribal leader' mode). That's why the SIAI is reducing the number
of decisions on how to deploy power made directly to the bare
minimum, and seeking as much critical review as possible on those.

> We are attempting to build Friendliness because we wish humanity
> to be respected by AGI.

No, we're not. We're not building the concept of 'respect' into
the AGI's goal system. That's FAI era stuff. The chances of getting
an utterly alien intelligence to absorb and extrapolate human
morality at a high level are minimal, in the same way that a human
altruist would, are minimal. CV works by using a lower-level
extrapolation of humanlike intelligence to generate an action
plan, with the utterly alien (AGI) intelligence not acting as a
moral node, merely providing the power to make the process possible.

> I believe that supergoals are not truly invariant. One does what
> all minds do - develop from genetic origins, form supergoals from
> a mixture of environment and heredity, and modify ones goals on the
> basis of reflection.

You do. An AGI won't have genetic origins, will use radically
different development mechanisms and has far superior and different
reflective capabilities. /AGIs are nothing like superbright humans/,
and you can't use folk psychology and emapthy to guess how they
will behave. If five years of SL4 discussions haven't rammed this
home, try reading GOFAI and evolutionary psychology papers in
alternating sequence until it becomes obvious.

> In a dynamic system variables not currently under scrutiny are
> changing in unpredictable ways, thus the uncertainty principle
> is maintained.

Irrelevant for the purposes of this discussion (and the uncertainty
principle is physics, not cognition); an AGI can halt all other
cognition while performing reflective analysis.

> If we can get to friendliness FIRST, then evolution might not
> explore mindlessness.

The sentient/nonsentient distinction isn't a simple binary one.
General intelligence (which humans only approximate) doesn't
require all the peculiar human cognitive hacks that create our
perceptions of qualia, subjective experience and ultimately
human morality. Just look at AIXI-TL, a complete (though intractable)
definition of a super-powerful general intelligence with none of
these things.

> Evolution is a tool to be harnessed, not one to be circumvented.

Evolution is dangerous and inefficient. Human designers don't
massively iterated, blind, naive trial-and error and there's no
good reason for an AGI to do so either.

> It's not about 'escaping' evolutionary pressure - that is like
> saying that everything would be easier if the laws of the
> universe were different.

Everything would be easier if the laws of the universe were
different. We're working on removing 'survival of the fittest'
and a whole load of other junk and putting some rather more
productive and pleasant (FAI generated) constraints in their place.

 * Michael Wilson

ALL-NEW Yahoo! Messenger - all new features - even more fun!

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:50 MDT