From: Eliezer S. Yudkowsky (sentience@pobox.com)
Date: Thu Jan 25 2001 - 23:29:14 MST
Dale Johnstone wrote:
>
> How is the description expressed?
Well, the sensory correlate of Friendliness could be as simple as a human
typing "right" or "wrong". This isn't Friendliness itself, any more than
the photons striking your eyeball are an actual coffee cup or whatever -
what you want to convey to the AI is that ve cares about the proximate
causes of the human typing "right" or "wrong", not the actual typing. If
the current semantics are capable of describing unknown causes, then this
is technically pretty simple; you tell the AI that there's an unknown
cause behind the "right" or "wrong" and that Unknown factor is what really
counts.
If the AI hypothesizes something wrong about the causality behind the
sensory feedback, you type in "wrong".
Once the AI understands enough about the causality to realize that "right"
and "wrong" are being typed in as the product of a human mind - in other
words, if the AI has some nontrivial image of Other Minds as a complex
process producing the typed letters on the keyboard - then the AI knows
enough about what the feedback process is *supposed* to be doing to regard
as undesirable a wirehead design alteration that produces sensory feedback
unassociated with the unseen Other Mind that is a human programmer.
> What's to stop goal drift over sucessive redesign iterations?
External reference semantics, anchor/shaper semantics, and causal rewrite
semantics... can I put this question on hold until _Friendly AI_ is
finished?
Actually, in practical terms, I think I can take a stab at answering this
concretely. During development or a slowed takeoff, goal drift is
prevented by a human typing in "wrong". Past the point of hard takeoff,
if you've done your job right, goal drift is prevented by the AI correctly
predicting when an idealized human would type "wrong" and enfolding that
prediction into the design decisions.
> What if you get to a situation whereby an improvement in design is only
> possible by a slight reduction in accuracy compliance?
Freeze until you can check with the human programmer or find a less mixed
improvement.
-- -- -- -- --
Eliezer S. Yudkowsky http://intelligence.org/
Research Fellow, Singularity Institute for Artificial Intelligence
This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:35 MDT