Friendly AI koans

From: Eliezer S. Yudkowsky (sentience@pobox.com)
Date: Wed Jun 26 2002 - 11:11:11 MDT

Next message: Michael Roy Ames: "Re: Self-modifying FAI (was: How hard a Singularity?)"
Previous message: Eliezer S. Yudkowsky: "Re: Self-modifying FAI (was: How hard a Singularity?)"
Next in thread: James Higgins: "Re: Friendly AI koans"
Reply: James Higgins: "Re: Friendly AI koans"
Maybe reply: Justin Corwin: "Re: Friendly AI koans"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

1. You're designing a Friendship system. You think you know how to
transfer over the contents of your own moral philosophy over, but you can't
for the life of you think of any way to even begin to construct a moral
philosophy that could legitimately be said to belong to "humanity" and not
just you. Others have repeatedly demanded this of you and you think they
are completely justified in doing so. What do you do?

2. You didn't think of the idea of probabilistic supergoals when you were
designing the Friendship system. Instead your AI has a set of "real"
supergoals of priority 10, and one meta-supergoal of priority 1000 that says
to change the "real" supergoals to whatever the programmer says they should
be. At some point you want to tweak the meta-supergoal, but you find that
the AI has deleted the controls which would allow this, because the physical
event of any change whatever to the meta-supergoal is predicted to lead to
suboptimal fulfillment of the AI's current maximum-priority goal. If you
want a case like this to be recoverable by argument with the AI rather than
direct tampering with the goal system, what does the AI need to know - what
arguments does the AI need to perceive as valid - in order to be argued out
of its blind spot?

3. Someone offers a goal system in which sensory feedback at various levels
of control - from "pain" at the physical level to "shame" at the top
"conscience" level - acts as negative and positive feedback on a
hierarchical set of control schema, sculpting them into the form that
minimizes negative and maximizes positive feedback. Given that both systems
involve the stabilization of cognitive content by external feedback, what is
the critical difference between this architecture and the "external
reference semantics" in Friendly AI? How and why will the architecture fail?

-- 
Eliezer S. Yudkowsky                          http://intelligence.org/
Research Fellow, Singularity Institute for Artificial Intelligence

Next message: Michael Roy Ames: "Re: Self-modifying FAI (was: How hard a Singularity?)"
Previous message: Eliezer S. Yudkowsky: "Re: Self-modifying FAI (was: How hard a Singularity?)"
Next in thread: James Higgins: "Re: Friendly AI koans"
Reply: James Higgins: "Re: Friendly AI koans"
Maybe reply: Justin Corwin: "Re: Friendly AI koans"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:39 MDT