Compassion vs Respect; Exponential discounting of utility (was Re: Building a friendly AI from a "just do what I tell you" AI)

From: Tim Freeman (tim@fungible.com)
Date: Sun Dec 09 2007 - 07:48:00 MST

Next message: Tim Freeman: "Re: Compassion vs Respect; Exponential discounting of utility (was Re: Building a friendly AI from a "just do what I tell you" AI)"
Previous message: Rick Smith: "Re: Re: How to make a slave (was: Building a friendly AI)"
In reply to: Joshua Fox: "Re: Building a friendly AI from a "just do what I tell you" AI"
Next in thread: Tim Freeman: "Re: Compassion vs Respect; Exponential discounting of utility (was Re: Building a friendly AI from a "just do what I tell you" AI)"
Reply: Tim Freeman: "Re: Compassion vs Respect; Exponential discounting of utility (was Re: Building a friendly AI from a "just do what I tell you" AI)"
Reply: Joshua Fox: "Re: Compassion vs Respect; Exponential discounting of utility (was Re: Building a friendly AI from a "just do what I tell you" AI)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

2007/11/21, Tim Freeman <tim@fungible.com >:
> ... http://www.fungible.com/respect/index.html. Unfortunately the paper
> needs revision and hasn't yet made sense to someone who I didn't
> explain it to personally.

From: "Joshua Fox" <joshua@joshuafox.com>

(Tim skips some good stuff for lack of adequate response yet.)

>- I [Joshua] don't get why compassion and respect have to be
>separated. Both mean that the AI needs a utility function which
>matches another agent's. In the case of compassion, positive utility,
>and in the case of respect, negative. Since the AI can give
>different weightings to different agent's utility, it seems that we
>can cover "compassion" and "respect" with a single concept.

To review: compassion coefficients define how much the AI values
helping people. Roughly speaking, when someone is helped, to compute
the contribution to the AI's utility we multiply the beneficiary's
compassion coefficient by the AI's estimate of how much utility the
beneficiary got from the action.

Similarly, respect coefficients define how much the AI wants to avoid
hurting people. When someone is harmed, to compute the contribution
of that to the AI's utility (a negative number) we multiply the
beneficiary's respect coefficient (a positive number) by the AI's
estimate of how much the victim was harmed (a negative number).

We have several cases:

Having negative coefficients for either is not Friendly.

Having respect less than compassion leads to difficult-to-understand
consequences and just does not seem useful.

Having respect much greater than compassion leads to an AI that tends
to do nothing. Most actions can be interpreted as causing minor harm
to someone else. For example, my metabolism right now consumes oxygen
that would otherwise be available to you and contributes slightly to
global warming, so if the AI were stuck with a similar metabolism and
it had respect without compassion, it would probably kill itself to
avoid this minor ecological damage.

Having respect equal to compassion (and therefore not having to
distinguish between them) is the alternative Joshua is talking about.
An AI with these settings would tend to do "Robin Hood" type behavior,
taking from one person to give to someone else who needs the resources
a little bit more. These involuntary transfers could be money,
internal organs, or anything else of value. Well-informed people who
value having higher status than their neighbors, and who are winning
that game at the moment, would want to get rid of the AI.

I don't want that conflict, so I want the respect-to-compassion rato
be large enough so the vast majority of people would not be worse off
if the AI were built, deployed, and it worked as designed. I don't
like violent crime, so if it were up to me I'd set the ratio between
respect and compassion to be as high as possible while leaving the AI
still motivated to try to prevent most violent crime. In the paper I
guessed that that ratio might be around 10 or so, but to get the right
number you'd have to feed the AI some videos of violent crime and get
the ratio between the estimated dysutility for the victims and the
estimated utility for the perpetrators.

>- Might issues of horizons, time periods, and transaction demarcation be
>handled by introducing time into the utility function -- e.g., with
>exponential damping/discounting?

Exponential discounting fixes the odd behaviors you list, but it adds
others. If the AI discounts it's utility at 10% per year, and the
economy measured in dollars is growing at 20% per year, and the dollar
cost of utility is constant, then the AI will defer all gratification
until circumstances change. The people who the AI is nominally
serving might not like that.

There's also a technical problem with exponential discounting, which
is that I don't know how to bound the search if we don't have a finite
time horizon.

There is probably some reasonable solution to this that has the AI
guessing how to do time-discounting at the same time it's guessing
utilities. The horizon-free scheme described at pages 4-5 of
http://www.vetta.org/documents/ui_benelearn.pdf might be part of that
solution.

-- 
Tim Freeman               http://www.fungible.com           tim@fungible.com

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:01:01 MDT