From: Eliezer Yudkowsky (firstname.lastname@example.org)
Date: Thu May 27 2004 - 02:39:59 MDT
I recently had a conversation on FAI theory, which I am posting with
permission for the sake of illustrating that FAI theory doesn't always have
to be painful, if both parties follow something like the same communication
[Eliezer] Incidentally, you say you've been thinking about Friendliness.
If you have any thoughts that are readily communicable, I'd be interested.
[Eliezer] I won't be insulted if you tell me that your thoughts are
beyond human comprehension. I've been there.
[Eliezer] It amazes me how many people expect Friendliness to be a
<James> Nothing terribly communicable. I am wondering if a correct
implementation of initial state is even generally decidable.
[Eliezer] well, not "amazes" in a Bayesian sense, in the sense of human
[Eliezer] I don't know your criterion of correctness, or what you mean by
decidability. Thus the explanation fails, but it is a noisy failure.
<James> I'm having a hard time seeing a way that one can make an
implementation that is provably safe.
<James> At least generally.
<James> There might be a shortcut for narrow cases.
[Eliezer] In a general sense, you'd start with a well-specified abstract
invariant, and construct a process that deductively (not probabilistically)
obeys the invariant, including as a special case the property of
constructing further editions of itself that can be deductively proven to
obey the invariant
<James> but how do you prove that the invariant constrains expression
correctly in all cases?
[Eliezer] "constrains expression" <--- ?
<James> sorry. stays friendly
<James> at runtime
[Eliezer] to the extent you have to interact with probabilistic external
reality, the effect of your actions in the real world is uncertain
[Eliezer] the only invariant you can maintain by mathematical proof is a
specification of behaviors in portions of reality that you can control with
near determinism, such as your own transistors
[Eliezer] there's a generalization to maintaing probable failure-safety
with extremely low probabilities of failure, for redundant unreliable
components with small individual failure rates
[Eliezer] the tough part of Friendly AI theory is describing a
mathematical invariant such that if it holds true, the AI is something we
recognize as Friendly
<James> that's the problem
[Eliezer] for example, you can have a mathematical invariant that in a
young AI works to produce smiling humans by doing various humanly
comprehensible things that make humans happy
[Eliezer] in an RSI AI, the same invariant binds to external reality in a
way that leads the external state corresponding to a tiny little smiley
face to be represented with the same high value in the system
[Eliezer] the AI tiles the universe with little smiley faces
[Eliezer] ...well, at least we agree on what the problem is
[Eliezer] any humanly comprehensible thoughts on the tough part of the
problem? I'll take humanly incomprehensible thoughts, even
<James> I've been studying it, from a kind of theoretical implementation
standpoint. Very ugly problem
<James> No thoughts yet.
[Eliezer] the problem is that humans aren't mathematically well-specified
[Eliezer] just ad-hoc things that examine themselves and try to come up
with ill-fitting simplifications
[Eliezer] we can't transfer our goals into an AI if we don't know what
<James> Yep. Always have to be aware of that
[Eliezer] my current thinking tries to cut away at the ill-formedness of
the problem in two ways
[Eliezer] first, by reducing the problem to an invariant in the AI that
flows through the mathematically poorly specified humans
[Eliezer] in other words, the invariant specifies a physical dependency
on the contents of the human black boxes that reflects what we would regard
as the goal content of those boxes
[Eliezer] second, by saying that the optimization process doesn't try to
extrapolate the contents of those black boxes beyond the point where the
chaos in the extrapolation grows too great
[Eliezer] just wait for the humans to grow up, and make of themselves
what they may
<James> I've noticed. Seems like a reasonable approach
<James> Don't know if it is optimal though
<James> for whatever "optimal" means
<James> I'm not satisfied that I have a proper grip on the problem yet
[Eliezer] nor am I
[Eliezer] there are even parts where I know specifically that my grip is
<James> Too many things I define lazily
[Eliezer] qualia, or my moral understanding of renormalizing initial
[Eliezer] er, put "qualia" in sarcastic quote marks
-- Eliezer S. Yudkowsky http://intelligence.org/ Research Fellow, Singularity Institute for Artificial Intelligence
This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:47 MDT