Re: UCaRtMaAI paper (was Re: Building a friendly AI from a "just do what I tell you" AI)

From: Wei Dai (weidai@weidai.com)
Date: Thu Nov 22 2007 - 12:43:41 MST


Tim Freeman wrote:
> But you understood it well enough to see the issues. Maybe it's more
> comprehensible than I thought. Thanks for taking a look.

It helps to already have the necessary background knowledge, for example
what the Speed prior is. (Actually I am not a big fan of the Speed prior.
See
http://groups.google.com/group/everything-list/browse_frm/thread/411eedecc7af80d8/d818a1f516f5a368
for a discussion between Juergen Schmidhuber and myself about it.)

> The answer to that question depends on the AI's simplest explanations of
> the human's goals. This depends on the AI's past observations of the
> human's previous behaviors (and the behaviors of the other humans,
> since the AI explains them all at once). You didn't specify the past
> behaviors, and even if you did I don't have enough brainpower to run
> the AI's algorithm, so I don't know the answer to your question.

Ok, you don't know what the AI will do exactly, but do you have reason to
believe that the scheme will work out, in the sense that the AI will be able
to form accurate estimates of people's utility functions? Let me give
another example that may be clearer.

First suppose I don't know that the AI exists. If I reach for an apple in a
store, that is a good indication that I consider the benefit of owning the
apple higher than the cost of the apple. If the AI observes me reaching for
an apple without being aware of its existence, it can reasonably deduce that
fact about my utility function, and pay for the apple for me out of its own
pocket. But what happens if I do know that the AI exists? In that case I
might reach for the apple even if the benefit of the apple to me is quite
small, because I think the AI will pay for me. So then how can the AI figure
out how much I really value the apple?

> That's a reasonable reaction to the paper because I didn't emphasize
> in the best place how that issue is dealt with. The utility is an
> integer with a bounded number of bits, so many of the constants you
> might want to multiply by will cause overflow and break the
> explanation. Similarly, there's only a finite number of constants you
> can add, depending on the range of values. The broken explanations
> don't contribute to the final result.

Do you think this will actually result in a *fair* comparison of
interpersonal utilities? If so why?

> Humans do interpersonal comparison of utilities routinely. I want to
> stay alive tomorrow, and you probably want to eat dinner tomorrow. I
> think that in the simplest circumstances consistent with what has been
> said so far, we'll all agree that my desire to stay alive tomorrow is
> greater than your desire to eat dinner tomorrow.

What about my desire for greater support of classical music, versus my
neighbor's desire for more research into mind-altering drugs? It's not
always so clear...

> Since humans can do
> it, if you buy into Church-Turing thesis you have to conclude that
> there is an algorithm for it. If you don't buy into the
> Church-Turing thesis, you're reading the wrong mailing list. :-).

When I said no "obvious" way, I meant that there isn't an obvious algorithm
that is fair. There is actually an infinite number of algorithms that can be
used, and choosing among them is the real problem.

> If someone has a better idea, please speak up. We really do have to
> compare utilities between people, since the FAI will routinely have to
> choose between helping one person and helping another. (For a
> suitably twisted definition of "Friendly", one can argue that the
> previous sentence is false. I'd be interested in seeing details if
> anyone can fill them in in a way they honestly think makes sense.)

You can find plenty of papers if you do a search for "interpersonal
comparison of utilities" but I'm not sure any of them are good constructive
solutions.
 



This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:01:00 MDT