Re: Building a friendly AI from a "just do what I tell you" AI

From: Wei Dai (
Date: Wed Nov 21 2007 - 16:45:37 MST

Tim Freeman wrote:
> I have some ideas about how to do this in the paper at
> Unfortunately the paper
> needs revision and hasn't yet made sense to someone who I didn't
> explain it to personally. Maybe I'll be able to make it readable over
> Thanksgiving.
> In that paper I specify a machine that will infer the goals of a given
> group of people and pursue a weighted average of those goals, given
> sufficient training data about the perceptions and voluntary actions
> of those people. Your voluntary actions are the contractions of your
> voluntary muscles, so the problem of providing the training data is
> conceptually simpler than the problem we started with.

I think this paper is quite valuable as an illustration of the non-trivial
nature of the problem. However I see two important issues that are not
mentioned in the paper.

1. Suppose a human says to the AI, "please get an apple for me." In your
scheme, how does the AI know what he really wants the AI to do? (Buy or
pick, which store, etc.) What utility function the human is trying to
maximize by saying that sentence depends on the human's expectation of the
consequences of saying that sentence, which depends on what he thinks the AI
will do upon hearing that sentence, which in turn depends on the AI's
beliefs about the human's expectations of the consequences of saying that
sentence. How do you break this cycle?

2. If you take an EU-maximizing agent's utility function and add or multiply
it by a constant, you wouldn't change the agent's behavior at all, because
whatever choices maximized EU for the old utility function would also
maximize EU for the new utility function. So from someone's behavior, you
can at best only obtain a family of equivalent utility functions that are
positive affine transformations of each other. There is no obvious way to
combine these families together into an average social utility function.
This is a well known problem called "interpersonal comparison of utilities".

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:01:00 MDT