From: Phil Goetz (firstname.lastname@example.org)
Date: Tue Aug 23 2005 - 22:05:26 MDT
--- "Eliezer S. Yudkowsky" <email@example.com> wrote:
> Phil Goetz wrote:
> > CST might say things such as
> > - a plot of the number of goals of the system vs. the importance of
> > those goals would show a power-law distribution
> > - there is some critical number of average possible action
> > above which the behavior of the system leads to an expansion rather
> > than a contraction in state space
> > - there is a ratio of exploration of new hypotheses over
> > of confirmed hypotheses, and there are two values for this ratio
> > locate phase shifts between "static", "dynamic", and
> > "unstable/devolving" modes of operation
> Phil, I think those are the first three interesting (falsifiable)
> things I've
> ever heard anyone say about CST and intelligence. Did you make them
> up on the
> spot, or would you seriously advocate/support any of them? Are there
I just made them up.
- Plotting number of goals per importance level: There are
numerous examples in the CST literature about systems that
have events of different sizes. Classic examples include
earthquakes, sandpile avalanches, percolation lattices,
and cellular automata (e.g., length of time that an initial
configuration in Conway's game of Life takes to converge).
For certain systems - which appear to be the systems with
the most computational power in information-theoretic terms
- the number of events of size s is described by the equation
P(size = s) = k / (s^c).
These systems may have three modes of operation: mode 1
("solid"), in which P(size = s) has something like a Poisson
distribution; mode 2 ("liquid"), in which P(size=s) = k/(s^c),
and mode 3 ("gaseous"), in which all events have infinite size
(never stop, or have no gaps in continuity, like an infinite
percolation lattice that is fully-connected). In many cases,
specific numbers can be found that delineate the transition
between these nodes. For infinite 2-dimensional percolation
lattices where each point has 8 neighbors, for instance,
the first infinite-size connected group occurs when the
lattice density (probability of a site being occupied) is
I did some analysis which suggests that there is a single
distribution underlying all three phases, which is dominated
by a power-law term within the "liquid" region.
I have no good reason to think that the importance of goals
would have such a distribution. I would expect that the number
of inferences made to plan for a goal, including dead-end inferences,
could have such a distribution, depending on how many possible
inferences can be made from each new fact. The average number of
possible inferences to make from a just-derived fact plays
the same role as the average number of neighbors that an occupied
point in a percolation lattice, or the probability of turning
a randomly-chosen cell on in the next iteration of a Life game.
- there is some critical number of average possible action
transitions: That wasn't stated well. I was thinking of
behavior networks, like Pattie Maes' Do the Right Thing
network, in which each behavior enables some other behaviors,
and of probabilistic finite-state automata. But the notion
of an organism's state space isn't well-defined enough for
real organisms for the statement to make sense. For simple
simulated organisms, the state space is finite, so again it
doesn't make sense.
A better use of the ideas going into it (stuff from
Stu Kauffman's 1993 book The Origins of Order on networks
constructed from random Boolean transition tables)
might be to say:
Suppose a reactive organism observes v variables
at each timestep, and is trying to learn which n of these
v variables it should pay attention to in order to choose
its next action. Let H be the average information content,
in bits, of a proposed set of n variables (the entropy of
the distribution of possible next actions based on them).
There is some value c such that, for H << c,
the organism always takes (uninteresting) short action
sequences; for H >> c, the set of outcomes to explore
will be too large for learning to take place. The number
of variables n to consider should be chosen so as to set H = c.
One might do this by using PCA on your original v variables,
and pulling off the highest-ranked principal components
as your operational variables until their entropy sums to c.
This brings us back to the utility of signal processing.
And information theory. :)
- ratio of exploration of new hypotheses over exploitation
of confirmed hypotheses: The language comes from Holland's
genetic algorithm theory, which shows that the genetic
algorithm (without mutation) leads to an optimal
balance between exploration and exploitation (provided
the evaluation function provides scores for an organism
with a normal distribution around its average value).
The idea comes from simulations of evolution, or from
any other optimization method, in which, if you keep
mutation (or, say, the temperature in simulated annealing)
too low, you get too-slow convergence on a good solution,
but if you crank it up too high, you get poor solutions.
- Phil Goetz
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:52 MDT