From: Metaqualia (metaqualia@mynichi.com)
Date: Thu May 27 2004 - 10:33:50 MDT
Why does this stuff need to be labeled as "beyond human comprehension"?
Paraphrasing:
> <James> Nothing terribly communicable. I am wondering if a correct
> implementation of initial state is even generally decidable.
Very hard problem, wondering if there is a solution after all.
> [Eliezer] I don't know your criterion of correctness, or what you mean
by
> decidability. Thus the explanation fails, but it is a noisy failure.
How would you know a good solution?
> <James> I'm having a hard time seeing a way that one can make an
> implementation that is provably safe.
You probably can't prove beyond doubt that a solution is good.
> [Eliezer] In a general sense, you'd start with a well-specified abstract
> invariant, and construct a process that deductively (not
probabilistically)
> obeys the invariant, including as a special case the property of
> constructing further editions of itself that can be deductively proven to
> obey the invariant
Define your goal very specifically and then create the system so that it
will necessarily
and deterministically arrive to that goal. Include a permission for the
system to duplicate itself while maintaining this goal and architecture in
the copy.
> <James> right
> <James> but how do you prove that the invariant constrains expression
> correctly in all cases?
but how do you prove that the goal really is achieved and the system doesn't
drift away?
> [Eliezer] to the extent you have to interact with probabilistic external
> reality, the effect of your actions in the real world is uncertain
Since reality is complex you can't be sure of what is going to happen out
there
> [Eliezer] the only invariant you can maintain by mathematical proof is a
> specification of behaviors in portions of reality that you can control
with
> near determinism, such as your own transistors
the only thing you control is the system's brain itself
> [Eliezer] there's a generalization to maintaing probable failure-safety
> with extremely low probabilities of failure, for redundant unreliable
> components with small individual failure rates
you can build redundant components each one with a very low probability of
failure
> [Eliezer] the tough part of Friendly AI theory is describing a
> mathematical invariant such that if it holds true, the AI is something we
> recognize as Friendly
the tough part is finding out what exactly is the goal
> <James> precisely.
> <James> that's the problem
> [Eliezer] for example, you can have a mathematical invariant that in a
> young AI works to produce smiling humans by doing various humanly
> comprehensible things that make humans happy
for instance if you set the goal to produce smiles
> [Eliezer] in an RSI AI, the same invariant binds to external reality in
a
> way that leads the external state corresponding to a tiny little smiley
> face to be represented with the same high value in the system
> [Eliezer] the AI tiles the universe with little smiley faces
the machine may tile the universe with smiley faces.
> <James> I've been studying it, from a kind of theoretical implementation
> standpoint. Very ugly problem
hmm I thought about it all day and it is very difficult.
> <James> No thoughts yet.
dunno
> [Eliezer] the problem is that humans aren't mathematically
well-specified
> themselves
> [Eliezer] just ad-hoc things that examine themselves and try to come up
> with ill-fitting simplifications
> [Eliezer] we can't transfer our goals into an AI if we don't know what
> they are
we don't really know what the goal is
> <James> Yep. Always have to be aware of that
> [Eliezer] my current thinking tries to cut away at the ill-formedness of
> the problem in two ways
so I am trying to find out in two ways
> [Eliezer] first, by reducing the problem to an invariant in the AI that
> flows through the mathematically poorly specified humans
first make sure that the AI actually knows what the goal is even though
humans don't
> [Eliezer] in other words, the invariant specifies a physical dependency
> on the contents of the human black boxes that reflects what we would
regard
> as the goal content of those boxes
in other words the goal is specified as being the goal content of human
brains (if we knew what that goal content was)
> [Eliezer] second, by saying that the optimization process doesn't try to
> extrapolate the contents of those black boxes beyond the point where the
> chaos in the extrapolation grows too great
second, make sure the AI doesn't try to second guess us if stuff gets too
complex
> [Eliezer] just wait for the humans to grow up, and make of themselves
> what they may
just wait for humans to grow up and blow themselves up on their own
> <James> I've noticed. Seems like a reasonable approach
> <James> Don't know if it is optimal though
> <James> for whatever "optimal" means
> <James> I'm not satisfied that I have a proper grip on the problem yet
> [Eliezer] nor am I
> [Eliezer] there are even parts where I know specifically that my grip is
> slipping
I still don't understand some things
mq
This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:47 MDT