Re: Why extrapolate? (was Re: [sl4] to-do list for strong, nice AI)

From: Frank Adamek (f.adamek@yahoo.com)
Date: Sun Oct 25 2009 - 19:02:24 MDT


--- On Sun, 10/25/09, Matt Mahoney <matmahoney@yahoo.com> wrote:

>I know these topics have been discussed, but as far as I know they have
>not been answered in any way that settles the question of "what is
>friendly?"

>And this raises the question "what is happiness?" If happiness can be
>modeled by utility, then the AI can compute your utility for any mental
>state. It does a search, finds the state of maximum utility, and if your
>brain has been replaced with a computer, puts you directly into this
>state. This state is fixed. How does it differ from death?

>Or if utility is not a good model of happiness, then what is?

I think about this often, and so far my best answer is preference utilitarianism. To a large extent, I'm talking about not using the concept of "utils" at all (though not mentioned by name, this seems common on LW). Except perhaps to compare how much different people's preferences are satisfied by an action, but not (primarily) taking into account how happy that action would make them. (Or we can use "them" if we're trying to talk about uploading.)

It's almost a cop-out, but in other words my favorite solution is to do what people want you to do. I say "almost a cop-out" because I'm talking about just asking them what they want, giving them the best information about the outcome (keeping in mind potential ethical costs to computing), then doing it.

Even though I think this is the best principle to center things around, there are obviously things to be patched up, and here a utilitarianism based on positive experience/sensation does seem to be handy.

It gets tricky when one person's desires conflict with another's. Perhaps we could say that a threshold of sufficiently small or temporary sufferings can be ignored or outweighed, such as not getting your favorite dessert or lover, or having to wait around an airport for a day. Beyond a point we might say that nothing justifies involuntary death or some sufferings, like the equivalent of having your fingernails pulled out, no matter how happy it would make someone else. Maybe creating new moral entities in one way or another would be disallowed to prevent overcrowding, with some exceptions. Just weighing the unhappiness against another's happiness would be vulnerable to an entity self-modifying to find an outcome radically more pleasurable. Hopefully in a post-scarcity society, with the possible exception of computing resources, these conflicts would be more uncommon than today.

It gets tricky when people are self-modifying and their preferences aren't constant. Even with the best information, existing preferences may lead to a being that has extremely low wellbeing, and even though that's not the core of this ethical framework this seems unfortunate. It could also happen due to the necessarily finite information available about outcomes, where people accidentally become something they wouldn't have chosen. Altering someone against their will to be a happier being that finds values in different things isn't attractive, but perhaps beings could enter into self-modification insurance policies with a stable and powerful group or entity (singleton?). Their policy might be that they are never allowed to make a choice that results in lower wellbeing, or that substantially pursues specified actions they currently dislike. If by mistake this happened, they could then be forcibly changed back. The policy might be that their well-being
 never drop below a certain "constant" wellbeing, or that if it remains there too long they are reset to some earlier point. They might specify that gained memories are retained and only their "utility functions" altered, or also that after a few decades they are allowed to alter their policy with information they've gained since then. While it might be very rare with all these safeguards (could we make some kind of self-modification insurance mandatory?) there could be cases when someone gets stuck being miserable or committing suicide, when earlier they would never have done so. If there were such cases, I don't yet have a solution. Maybe that's an acceptable loss.

It gets tricky when dealing with entities who can't communicate, like animals. Maybe you try and guess what they want, what makes them happy, what they care about. Maybe you weigh their preferences less than articulate beings, maybe based on ideas of intelligence. Maybe you just leave their wellbeing up to the fact that speaking entities have preferences involving the lives of animals, which in a way is what we do already.

It gets tricky when you consider children, who sometimes make choices that only consider the most near term results even if informed about long term effects. Without an insurance policy they could become very unhappy, but at the same time we want them to be able to grow. Perhaps for those without insurance policies that would solve this, we could restrict people from making choices that harmed long term wellbeing for too little short term wellbeing. Maybe we can restrict choices like this just till people reach some standard of intelligence, like a human adult. Maybe a sufficiently clever human adult, and if it's possible to nudge intelligence a little without affecting utility functions, maybe we modify people up to that level if they don't seem to be making it naturally.

---------------

If I understand CEV as well as I think, you'll notice this is pretty similar and might end up largely the same: it basically avoids things we dislike while letting us progress at a safe speed to things we like. I mention all that I did as an idea for how that AI might determine what people want: by asking them. I've tried to suggest reasonable solutions for some more obvious problems I can see with using this precept alone. The part about providing the best possible information about outcomes is trickier, but perhaps an AI can just try and simulate what we'd ask for in the future. Hopefully self-modification insurance can help protect against failure here as well.

I wouldn't tend to call this solution "perfect", but it seems to me the safest way I've yet seen to maintain value. We may not arrive at a society of bliss very quickly, individually or as a civilization, but hopefully we can keep from sliding backwards or losing important bits as we move forward, and not have too bad a time as we journey there.

Frank Adamek

P.S. I've only been reading LW and OB for a few months and I couldn't find something about this with a search there or in the sl4 archives, but please let me know if/where this has been discussed before.

      



This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:01:05 MDT