A conversation on Friendliness

From: Eliezer Yudkowsky (sentience@pobox.com)
Date: Thu May 27 2004 - 02:39:59 MDT

Next message: Ben Goertzel: "RE: Volitional Morality and Action Judgement"
Previous message: Eliezer Yudkowsky: "Re: Volitional Morality and Action Judgement"
Next in thread: Damien Broderick: "Re: A conversation on Friendliness"
Reply: Damien Broderick: "Re: A conversation on Friendliness"
Reply: Metaqualia: "Re: A conversation on Friendliness"
Maybe reply: J. Andrew Rogers: "Re: A conversation on Friendliness"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

I recently had a conversation on FAI theory, which I am posting with
permission for the sake of illustrating that FAI theory doesn't always have
to be painful, if both parties follow something like the same communication
protocol:

  [Eliezer] Incidentally, you say you've been thinking about Friendliness.
  If you have any thoughts that are readily communicable, I'd be interested.
  [Eliezer] I won't be insulted if you tell me that your thoughts are
beyond human comprehension. I've been there.
  [Eliezer] It amazes me how many people expect Friendliness to be a
trivial subject.
  <James> Nothing terribly communicable. I am wondering if a correct
implementation of initial state is even generally decidable.
  [Eliezer] well, not "amazes" in a Bayesian sense, in the sense of human
dismay
  [Eliezer] I don't know your criterion of correctness, or what you mean by
decidability. Thus the explanation fails, but it is a noisy failure.
  <James> I'm having a hard time seeing a way that one can make an
implementation that is provably safe.
  <James> At least generally.
  <James> There might be a shortcut for narrow cases.
  [Eliezer] In a general sense, you'd start with a well-specified abstract
invariant, and construct a process that deductively (not probabilistically)
obeys the invariant, including as a special case the property of
constructing further editions of itself that can be deductively proven to
obey the invariant
  <James> right
  <James> but how do you prove that the invariant constrains expression
correctly in all cases?
  [Eliezer] "constrains expression" <--- ?
  <James> sorry. stays friendly
  <James> at runtime
  [Eliezer] to the extent you have to interact with probabilistic external
reality, the effect of your actions in the real world is uncertain
  [Eliezer] the only invariant you can maintain by mathematical proof is a
specification of behaviors in portions of reality that you can control with
near determinism, such as your own transistors
  [Eliezer] there's a generalization to maintaing probable failure-safety
with extremely low probabilities of failure, for redundant unreliable
components with small individual failure rates
  <James> Right
  [Eliezer] the tough part of Friendly AI theory is describing a
mathematical invariant such that if it holds true, the AI is something we
recognize as Friendly
  <James> precisely.
  <James> that's the problem
  [Eliezer] for example, you can have a mathematical invariant that in a
young AI works to produce smiling humans by doing various humanly
comprehensible things that make humans happy
  [Eliezer] in an RSI AI, the same invariant binds to external reality in a
way that leads the external state corresponding to a tiny little smiley
face to be represented with the same high value in the system
  [Eliezer] the AI tiles the universe with little smiley faces
  [Eliezer] ...well, at least we agree on what the problem is
  <James> heh
  [Eliezer] any humanly comprehensible thoughts on the tough part of the
problem? I'll take humanly incomprehensible thoughts, even
  <James> I've been studying it, from a kind of theoretical implementation
standpoint. Very ugly problem
  <James> No thoughts yet.
  [Eliezer] the problem is that humans aren't mathematically well-specified
themselves
  [Eliezer] just ad-hoc things that examine themselves and try to come up
with ill-fitting simplifications
  [Eliezer] we can't transfer our goals into an AI if we don't know what
they are
  <James> Yep. Always have to be aware of that
  [Eliezer] my current thinking tries to cut away at the ill-formedness of
the problem in two ways
  [Eliezer] first, by reducing the problem to an invariant in the AI that
flows through the mathematically poorly specified humans
  [Eliezer] in other words, the invariant specifies a physical dependency
on the contents of the human black boxes that reflects what we would regard
as the goal content of those boxes
  [Eliezer] second, by saying that the optimization process doesn't try to
extrapolate the contents of those black boxes beyond the point where the
chaos in the extrapolation grows too great
  [Eliezer] just wait for the humans to grow up, and make of themselves
what they may
  <James> I've noticed. Seems like a reasonable approach
  <James> Don't know if it is optimal though
  <James> for whatever "optimal" means
  <James> I'm not satisfied that I have a proper grip on the problem yet
  [Eliezer] nor am I
  [Eliezer] there are even parts where I know specifically that my grip is
slipping
  <James> Too many things I define lazily
  [Eliezer] qualia, or my moral understanding of renormalizing initial
conditions
  [Eliezer] er, put "qualia" in sarcastic quote marks

-- 
Eliezer S. Yudkowsky                          http://intelligence.org/
Research Fellow, Singularity Institute for Artificial Intelligence

Next message: Ben Goertzel: "RE: Volitional Morality and Action Judgement"
Previous message: Eliezer Yudkowsky: "Re: Volitional Morality and Action Judgement"
Next in thread: Damien Broderick: "Re: A conversation on Friendliness"
Reply: Damien Broderick: "Re: A conversation on Friendliness"
Reply: Metaqualia: "Re: A conversation on Friendliness"
Maybe reply: J. Andrew Rogers: "Re: A conversation on Friendliness"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:47 MDT