From: Johnicholas Hines (johnicholas.hines@gmail.com)
Date: Sun Feb 22 2009 - 12:49:42 MST
On Fri, Feb 20, 2009 at 9:39 AM, Petter Wingren-Rasmussen
<petterwr@gmail.com> wrote:
> As I mentioned in this thread i think an AGI with hardcoded dogmatic rules
> will have some serious drawbacks in the long run. I will try to show an
> alternative here, that still will remain friendly to humans.
My main interest (relative to this list) is safety protocols for
research and development of AGI. Accordingly, I'm interested in the
risks of your proposed AGI R&D effort.
If I understand correctly, you propose using evolution or something
similar, inside a sequence of synthetic environments. The environments
are designed to simultaneously select for friendly emotions and useful
capabilities. The sequence of environments can be considered one
environment with a sequential structure.
The teaching environment is described as having two properties. First,
it's intended to be easy for the adaptive process (whether that's
evolution or stochastic hillclimbing or whatever) to learn. (This is
similar to Melanie Mitchell's Royal Road fitness functions.) Second,
it is an implicit fitness function defined by coevolutionary
interactions.
There are a couple of risks that I think I can see:
1. Genetic algorithms (along with brute-force search and other generic
learning strategies) produce holographic output. By "holographic" I
mean every part relates to every other part; it is not modular the way
humans normally design and understand things.
The modular/means-ends designs that humans generate, generally mirror
the proofs, or the informal arguments that humans use to understand
_how_ the design accomplishes the goal. (The human kind of design is
explained and advocated in Suh's "Axiomatic Design".)
Humans have areas (such as digital filters) where we frequently use
holographic designs, but (at least in that case) there is a lot of
math around it, explaining how it is impossible to have everything
that you might want, and how to transform a specification into a
design.
Holographic AGI means you can't examine the structure of the AGI and
predict how it will behave. This is risky.
2. Sorry, I've been obsessing about this simple two-dimensional model
of capability increase - please let me ramble for a bit before getting
to the point.
When you apply an optimization process like simulated annealing or a
genetic algorithm and measure its performance, we expect to see the
performance (fitness) curve upward - first derivative is generally
positive (improvement), second is generally negative (diminishing
returns).
If you took something which is very finely tuned, and then start
making random changes, then you expect to see a decrease in
performance. Also, if you took something which was completely random,
and then start making random changes, then you expect to see the (bad)
performance stay the same. So now we have first derivative generally
negative (decrease), and second derivative positive (flattening).
The question is: if you apply optimization pressure towards one
performance measure, and then tracked its performance on a different
performance measure, what dynamics do you expect to see? It depends
whether the performance measures are correlated or not.
If you have two different performance measures that are very
uncorrelated, for example:
A. Optimizing the number of "1" bits in a bitstring.
B. Optimizing the number of "1" bits in a hash of the bitstring.
Then the other performance dynamics should look just like the random dynamics.
If the two different performance measures are actually equivalent,
then the other performance dynamics should look just like the the
original performance dynamics.
So here is my model:
The derivative with respect to time of the original performance
measure (p) is increased by the selective pressure (1 (hey, it's a
simplistic model!)), and decreased by a factor proportional to how
optimized the system is with respect to p already (p*P_DECAY). In
total:
D[p] = 1 - p * P_DECAY
The derivative with respect to time of the other performance measure
(p_prime) is increased by a factor proportional to the similarity of
the performance measures (SIMILARITY) and decreased by a factor
proportional to how optimized the system is with respect to p_prime
already (p_prime * P_PRIME_DECAY). In total:
D[p_prime] = SIMILARITY - p_prime * P_PRIME_DECAY
Researchers into narrow AI are not particularly concerned about the AI
becoming wildly successful and taking over the world. I think this is
because they're implicitly using this sort of model. One measure is
whatever they're actually working on (chess or image processing or
whatever), and the other measure is "capability of taking over the
world", and they believe the "SIMILARITY" is low between those two
things.
This model is relevant to the risks of evolutionary algorithms,
because we're evolving in a synthetic environment, and then
generalizing to the behavior in the real world. In order to correctly
generalize, the "SIMILARITY" factor needs to be high.
I think we don't understand how to argue that the "SIMILARITY" factor
is high: That scoring well in the simulated environment will lead to
what we really want to accomplish. At the very least, the research and
development program also should study how we can validly, reasonably
predict behavior in the real world, based on success in the simulated
world.
If I understand correctly, this simulated/real gap is what Yudkowsky's
"External Reference Semantics" is intended to address.
Thanks for reading.
Johnicholas
This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:01:04 MDT