Re: UCaRtMaAI paper

From: Tim Freeman (tim@fungible.com)
Date: Fri Nov 23 2007 - 09:20:36 MST


From: "Wei Dai" <weidai@weidai.com>
>Tim, have you read my recent posts titled "answers I'd like from an SI" and
>"answers I'd like, part 2"?

No, I hadn't, but now I have. Your issue there seems to be that you
want some scheme that has verbal communication near enough to its core
so that answering a question is regarded as something more fundamental
than making noises to satisfy the stupid humans. You feel frustrated
that none of the existing schemes based on optimizing some utility
function do that. I agree that that would be nice to have. I don't
see it as essential, in the sense that I can imagine a useful and
friendly AI that doesn't treat verbal communication as special in any
way.

There is a laundry list of questions in your 13 Nov 2007 "answers I'd
like, part 2" post, and some concerns about what might go wrong if
those questions are answered wrong. However, I still don't see
question-answering as essential, in the sense that making the errors
you described seems similar in kind to me to incorrectly guessing that
everyone wants to die and then helping them out. There are all sorts
of incorrect guesses that could be made, and the ones you list do not
seem to me to be more likely than other mistakes that don't revolve
around philosophical questions.

>I don't think that helps too much, because if everyone knows that the AI
>will stop helping only temporarily, they will take that into account and
>still not act in ways that reveal their true utilities.

It would give some useful information. If I'm hungry for an apple
now, and the AI says it won't help me until tomorrow, thoughts about
the AI's help tomorrow aren't going to affect much my present attempts
to get the apple.

So far as I can tell, you are arguing that my scheme isn't perfect. I
agree with that statement, but in the absence of a perfect or even a
conjectured-to-be-better scheme, I'm not very interested.

>...helping whichever child you think needs your help most (which
>encourages them to exaggerate how much help they need)

I agree that having the AI help those who appear to need the most
would encourage people to exaggerate how much help they need.
Guessing that "Joe is doing X because he thinks it will make the AI
give him Y" isn't a special case in the UCaRtMaAI algorithm, so maybe
things will get back into equilibrium if the AI becomes better at
detecting exaggeration than Joe is at exaggerating.

Human motivation doesn't vary all that much from human to human, and
in the UCaRtMaAI algorithm there is one explanation of human
motivation that takes an agent-id as input. (Specifically, in the
diagram at http://www.fungible.com/respect/paper.html#beliefs,
Compute-Utility is one-per-explanation, and it takes A as input.) The
simplest explanations have Joe wanting apples about the same amount
that everyone else wants apples, so it shouldn't be too hard to figure
out about how much Joe wants an apple.

Do you have any ideas about how to help people without interfering
with the ability to infer what they want from their behavior?

>First, you haven't showed that the AI will actually draw the obvious-to-us
>conclusions correctly.

There is a collection of unit tests at
http://www.fungible.com/respect/code/#tests. They are all gross
simplifications, of course. The most complex ones are baby catching
at http://www.fungible.com/respect/code/baby_catch_test.py.txt and
grocery shopping at
http://www.fungible.com/respect/code/grocery_test.py.txt. Do you have
an obvious-to-us conclusion that you'd like to see that's about as
complex as these two?

>Second, if we eventually discover a moral philosophy
>that is a big improvement over what is hard coded into the AI, we are
>screwed because we won't be able to reason with it and get it to change.

That's true in some scenarios but not others. The AI has a time
horizon and all influences it has past that time horizon are done
because the humans have a desire, before the time horizon, to
reasonably expect those consequences after the time horizon.
So if the time horizon is one hour from now, and AI is able to figure
out that right now we want a new AI after one hour from now that
implements some shiny new moral philosophy, it could arrange for that.
We could lose if the present AI has bugs that prevent it from seeing
that we want the new moral philosophy to be in effect past the time
horizon.

So we wouldn't reason with the AI, we'd just make it clear what we
want, wait for the time horizon to come, and hope it functions well
enough to give us what we said we wanted.

This seems to be a problem with any AI that controls the world --
there is no guarantee that if you have some future inspiration about
how you want it to work, it will let you implement the change. This
can be a good thing, for example if your future inspiration is genocidal.

>But if arbitrariness is not a problem, then why not just pick the utility
>function of an arbitrary person instead of trying to average them?

I feel comfortable making an arbitrary choice among the best known
alternatives. Having one guy be the dictator is not a best known
alternative. If I'm the dictator, then your dinner tonight will be
what I want your dinner tonight to be, and your tastes don't matter.
If we average, then because I don't care much what you have for
dinner, and you probably do care, you will probably have the dinner
you want. So averaging is better than taking just one, at least
because it avoids some conflict about which one we'll take.

-- 
Tim Freeman               http://www.fungible.com           tim@fungible.com


This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:01:00 MDT