Re: I am a moral, intelligent being (was Re: Two draft papers: AI and existential risk; heuristics and biases)

From: Martin Striz (mstriz@gmail.com)
Date: Wed Jun 07 2006 - 15:41:30 MDT

Next message: Martin Striz: "Re: I am a moral, intelligent being (was Re: Two draft papers: AI and existential risk; heuristics and biases)"
Previous message: Eliezer S. Yudkowsky: "Re: I am a moral, intelligent being (was Re: Two draft papers: AI and existential risk; heuristics and biases)"
In reply to: Eliezer S. Yudkowsky: "Re: I am a moral, intelligent being (was Re: Two draft papers: AI and existential risk; heuristics and biases)"
Next in thread: Martin Striz: "Re: I am a moral, intelligent being (was Re: Two draft papers: AI and existential risk; heuristics and biases)"
Reply: Martin Striz: "Re: I am a moral, intelligent being (was Re: Two draft papers: AI and existential risk; heuristics and biases)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

On 6/7/06, Eliezer S. Yudkowsky <sentience@pobox.com> wrote:

> "Wanting X" means "choosing among actions by using your world-model to
> evaluate probable consequences and then outputting whichever action
> seems most likely to yield consequences with the best fit to X".
>
> Of course this is merely a rough paraphrase of the standard expected
> utility equation.
>
> An expected paperclip maximizer "wants paperclips" in the sense of
> outputting whichever action leads to the greatest expectation of
> paperclips according to the consequence-extrapolator of its world-model.
> An expected paperclip maximizer will not knowingly rewrite the part of
> itself that counts paperclips to count something else instead, because
> this action would lead to fewer expected paperclips, and the internal
> dynamics that output actions output whichever action most probably leads
> to the most expected paperclips. It may help to think of this
> optimization process as a "chooser of paperclip-maximizing actions",
> rather than using various intuitive terms which would shift the
> conversation from engineering to psychology.

That's a much better explanation. The problem of course is that AIs
will always make mistakes, just like people, because they will be
pursuing complex goals (much more complex than paper clip
maximization) in complex environments. Even if we fail to achieve our
goals through trial and error, the major ones are hardcoded, so we
don't lose them. We can just try again later. When you open up your
goal system to emendation, you open it up to mistakes.

I'm just saying that you can't /guarantee/ (i.e. p -> 1) that the
thing will stay Friendly.

AIs that recursively self-improve will be analogous to evolving
agents, just without reprodution. The trajectories that they take
through mindspace will be branchless phylogenetic paths, all rooted
back to the original AI design. They will be bumbling around in
complex environments, with their entire substrate open to
self-mutation.

All it takes is time.

Martin

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:56 MDT