Re: Question about CEV

From: Nick Hay (
Date: Mon Oct 29 2007 - 17:18:23 MDT

On 10/29/07, Matt Mahoney <> wrote:
> --- Thomas McCabe <> wrote:
> > CEV is a metamorality system. It doesn't say XYZ is good or bad: it
> > defines a procedure for how to determine if XYZ is good or bad. Apples
> > and oranges.
> From the paper: "...our coherent extrapolated volition is our wish if we knew
> more, thought faster, were more the people we wished we were, had grown up
> farther together; where the extrapolation converges rather than diverges,
> where our wishes cohere rather than interfere; extrapolated as we wish that
> extrapolated, interpreted as we wish that interpreted." (I hope this isn't
> taken out of context).
> My objection is that "were more the people we wished we were" makes CEV
> undefined if we allow the AI to reprogram our motivational systems.

This is not necessarily undefined. Take an unmodified human (or model
thereof), make some specific changes (e.g. increasing the accuracy of
the human's expectations), then extract an image of who that modified
human wishes to be. Take that image, make some changes, extract an
image of who it wants to be. Repeat N times.

I'm not saying this is how it would work, just demonstrating this
needn't be undefined. I haven't specified how to model a human, how
exactly to change the model e.g. to increase expectation accuracy, or
how to extract an image of what that human wants to be, but those are
technical questions I don't know the answer to.

> The AI
> could make us want to be the kind of person that allows the AI to tell us what
> we want.

Why would it? A CEV does not have its own agenda. An optimizing
compiler could take a program which computes primes numbers and make
it output only even numbers. But it doesn't, unless we designed it to
do so. A CEV could manipulate human motivational systems so that they
are easily satisfied, or some other obvious to us failure. But it
doesn't, unless we designed it to do so.

> But if we disallow reprogramming the motivational system then we
> could not treat many mental illnesses. I gave examples where neither the
> person's wish before or after the change would be a reliable indicator of what
> a rational person would wish for. The only other alternative would be a
> complicated rule that we would probably get wrong.

Or to write a process that extracts what how we detect whether our
extrapolation procedure is wrong, and use that to correct our
imperfect implementation. Or something else. There is always a third
alternative, especially in Friendly AI theory.

-- Nick

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:58 MDT