Re: UCaRtMaAI paper

From: Wei Dai (
Date: Fri Nov 23 2007 - 18:45:15 MST

Tim Freeman wrote:
> So far as I can tell, you are arguing that my scheme isn't perfect. I
> agree with that statement, but in the absence of a perfect or even a
> conjectured-to-be-better scheme, I'm not very interested.

My conjectured-to-be-better scheme is to not build an AGI until we're more
sure that we know what we are doing. My real point with the "answers I'd
like" is that we are really quite far from knowing what we are doing with
regard to AGI. It's not just that an optimizing process can't answer those
questions for us, but it also can't answer those questions for itself. Since
we don't know what the right answers are yet, we don't want to hard code
answers into an AGI in a way that won't allow them to be changed later.
(Perhaps this doesn't apply to your scheme, but see below.)

> Do you have any ideas about how to help people without interfering
> with the ability to infer what they want from their behavior?

I made the suggestion of giving each person a fixed quota of resources. Is
that something you've considered already?

> There is a collection of unit tests at
> They are all gross
> simplifications, of course. The most complex ones are baby catching
> at and
> grocery shopping at
> Do you have
> an obvious-to-us conclusion that you'd like to see that's about as
> complex as these two?

I think you need to write an explanation of what those unit tests are doing.
I'm not able to figure it out from the code. You might want to walk through
an example step by step. What are the agents' utility functions and beliefs,
what do the agents do, what does the AI infer, etc. I might be able to
suggest more scenarios for you to test once I have a better intuitive
understand of how the AI works.

> That's true in some scenarios but not others. The AI has a time
> horizon and all influences it has past that time horizon are done
> because the humans have a desire, before the time horizon, to
> reasonably expect those consequences after the time horizon.
> So if the time horizon is one hour from now, and AI is able to figure
> out that right now we want a new AI after one hour from now that
> implements some shiny new moral philosophy, it could arrange for that.
> We could lose if the present AI has bugs that prevent it from seeing
> that we want the new moral philosophy to be in effect past the time
> horizon.

Ok, I didn't realize that one implication of a limited planning horizon is
that the AI will allow itself to be replaced by another AI. That brings up
another set of problems though, which is how to make sure there aren't
loopholes that will allow the AI to be replaced by an unfriendly one that
serves someone or some group's self interests.

To take the simplest example, suppose I get a group of friends together and
we all tell the AI, "at the end of this planning period please replace
yourself with an AI that serves only us." The rest of humanity does not know
about this, so they don't do anything that would let the AI infer that they
would assign this outcome a low utility. I don't understand your design well
enough to claim that this exploit would definitely work, but neither do I
see an argument for why such loopholes do not exist.

> I feel comfortable making an arbitrary choice among the best known
> alternatives.

Among the infinite number of algorithms for averaging people's utility
functions, you've somehow picked one. How did you pick it? Given that the
vast majority of those algorithms are not among the best known alternatives,
what makes you think that the algorithm you picked *is* among the best known

For example, consider explicit calibration as an alternative. Design a
standard basket of goods and services, and calibrate each person's utility
function so that his utility of obtaining one standard basket is 1, and his
utility of obtaining two standard baskets is 2. To me, this seems a lot more
likely to be at least somewhat fair than an algorithm that relies on the
side effects of integer overflow.

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:01:01 MDT