Re: Recipe for CEV (was Re: Morality simulator)

From: Matt Mahoney (matmahoney@yahoo.com)
Date: Sun Nov 25 2007 - 16:46:16 MST


--- Nick Tarleton <nickptar@gmail.com> wrote:

> On Nov 24, 2007 8:40 PM, Matt Mahoney <matmahoney@yahoo.com> wrote:
>
> > The model P will distinguish between descriptions (in words or pictures)
> > of
> > friendly and unfriendly behavior by assigning higher probabilities to the
> > friendly descriptions. This is different than distinguishing between
> > friendly
> > and unfriendly behavior. I don't claim that such a thing is possible.
>
>
> If this worked at all (that is, if a detailed model of a human mind is
> actually the best way to compress a human's output AND your search algorithm
> can find its way out of all of the only-slightly-worse local minima), why
> would the model predict Friendly descriptions rather than human-typical
> ones?

I should have clarified that P predicts human-typical responses to the
question "is this friendly?" for various "this". P can answer questions about
the problem of knowing what people actually want, even if they don't ask for
it explicitly. For example, if you ask an AI to solve the Riemann hypothesis,
it knows that you want an explanation, not just a "yes" or "no", and it knows
you don't want to turn the Earth into computronium, even if that is the only
way to get an answer. P "knows" this in the sense that it assigns higher
probabilities to answers that humans would give to such questions.

Again, I must emphasize that P does not solve the friendliness problem. As
Eliezer points out, you can't just use "yes" answers to "is this friendly?" as
your utility function. A text compressor models human language, not human
motivation.

-- Matt Mahoney, matmahoney@yahoo.com



This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:01:01 MDT