Re: Extrapolated volition

From: Michael Wilson (
Date: Tue Nov 15 2005 - 02:54:25 MST

Russell Wallace wrote:
>> I would not write (domain protection) off as useless; some of the
>> problems may actually be solvable, and it is conceivable that it
>> could form an important component of a workable hybrid strategy.
> Speaking of hybrid strategies, I'm starting to think DP and CEV
> may not be as far apart as they appear at first glance.

They are quite different. As I see it, there are five basic
approaches to designing the goal system for a Friendly AI;

1. Lay down immutable pronouncements that you intend to shape the rest
   of human history. Domain protection and any attempts at 'objective
   morality' are in this category.
2. Lay down ground rules plus metagoals that will modify the basic
   goals as the FAI considers the problem and interacts with the world.
   Depending on the design, this may or may not settle rapidly to a
   static consistent-under-reflection goal system. Eliezer's model in
   'Creating a Friendly AI' is in this category, and AFAIK Ben's
   current model does too.
3. Lay down ground rules plus specific rules about how humans can
   change the FAI's basic goal system content (e.g. the 'AGI implements
   any majority vote of the UN' failure scenario, assuming that
   includes obeying directives to stop obeying the UN in future). This
   is a special case of (2).
4. Specify the absolute minimum possible yourself; get the AGI to
   generate goal system content by analysing some other source, i.e.
   scanning the brain state of all extant humans and extrapolating
   their future desires. This differs from (2) in that effectively
   there are no 'ground rules', just metagoals; the AGI has to go
   through the goal system generation process before it starts
   affecting the world. CV is almost in this category; it retains a
   couple of basic rules such as 'don't/try not to create any
   sentient beings in the extrapolation process'.
5. Instruct the AGI to do something specific and then shut down;
   this means specifying a strong bound on how long the AGI is going
   to be around. In principle you could add a fixed temporal bound
   to scenarios 2, 3 and 4, but usually this is proposed as an extra
   constraint on 1, i.e. having the AGI take some specific actions
   to mitigate existential risks and then cease activity.

> For the EV part, while I think I'd want it toned down some from
> the way Eliezer seems to see it, _some_ form of intelligent
> interpretation of volition is needed to avoid the murder by genie
> bottle problem;

Scenario (1) is just generally a bad idea. I wouldn't rule it out
utterly and completely, but we want to avoid inflicting badly thought
out rules (and there's a sharp limit to how well thought out anything
designed by humans, particularly a single human, is going to be) on
the entirety of future history if at all possible.

> the real stumbling block is the C part.

There is a strong tendency to have a knee-jerk reaction against the
term 'collective' just because we're all such devoted individualists
on SL4. Now it is arguable that Eliezer is in fact excessively
concerned about inclusiveness-above-all-else, but CEV is not some
dastardly communist plot. I encourage everyone to think about the
kinds of things in a society that should be dependent on a
non-unanimous consensus, and the most effective, least harmful ways
to generate that consensus. Then consider how you might push as
many parts of /that/ problem onto transhuman intelligences as
possible, and how it might need fixing if you got it wrong. Finally
you have to do a risk analysis on how the course of events could
deviate from the general constraints you're trying to impose, should
you make mistakes (and you probably will). This is (I think) how
Eliezer came up with the concept of CV.

> If an escape clause were added - the right for people to say "I
> don't want anything to do with path X that that lot are following,
> my volition is to go down path Y instead" - then I'd have far
> fewer problems with it.

That sounds like domain protection with everyone in the 'CV' domain
to start with. In theory, if this was a good idea then the CV would
implement such an escape clause for you. In practice, I think CV is
unavoidably dependent on initial conditions and structure to a far
greater extent than Eliezer might like. I remain open to the
possibility of some dazzling insight on the 'right way to do it',
but I am not holding my breath. Regardless, I am more amenable to
engineering the structural properties and base assumptions of a
CV process to change the ease with which some conclusions can be
generated, than to start specifying rules directly.

> (I know Eliezer's afraid of letting people start adding
> preconditions - but slippery slope arguments aren't always valid.
> If 100 Xs are bad, that doesn't always mean the right number of
> Xs is 0.)

I certainly agree there. Eliezer appears to be obsessively concerned
with finding a 'single unique solution' to the FAI problem, where
the uniqueness criterion will effectively be 'meets some personal
and probably highly abstract criterion of perfection'. While this is
not the field I have been focusing on, and it is possible that I
have a mistaken impression, this seems like suboptimal Singularity
strategy coloured by ill-chosen axioms to me. I would rather that
humanity makes it into the future with a few problems to deal with,
than we let the world be destroyed by UFAI/etc because we spent so
long trying to be 'perfect'. This isn't a contradiction of the
standard Yudkowsky rheotic about accepting less than your best
effort being fatal and rushing things leading to disaster (which
still applies, for the most part); it's about avoiding the folly of
chasing pointless and illusory standards of perfection.

That said, clearly all current FAI theory still has a /long/ way to

 * Michael Wilson

To help you stay safe and secure online, we've developed the all new Yahoo! Security Centre.

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:53 MDT