From: Ben Goertzel (
Date: Thu Dec 12 2002 - 06:04:11 MST

Bill Hibbard wrote:
> I am well aware of the relation between your approach based
> on planning behavior from goals, and my approach based on
> values for reinforcement learning.
> A robust implementation of reinforcement learning must solve
> the temporal credit assignment problem, which requires a
> simulation model of the world. This simulation model is the
> basis of reasoning based on goals. Planning and goal-based
> reasoning are emergent behaviors of a robust implementation
> of reinforcement learning.

Bill, the only worry I have with this statement is: I believe that solving
the temporal credit assignment problem requires a lot of structures and
dynamics that are outside the scope of what's normally considered
"reinforcement learning."

For instance, I suspect that solving the temporal credit assignment problem
involves [at least] the combination of things related to what we now call
"probabilistic higher-order inference" and things related to what we now
call "evolutionary programming".

It may even require the synthesis (inside the mind in question) of object
serving as goals and subgoals, and the spawning of inference or
inference-like problems relating to these goals/subgoals...

So there is not *necessarily* a contradiction between reinforcement learning
(in your general sense) and "planning behavior from goals".

But there is a distinction between an architecture in which goals are
explicitly given from the outset (which is my understanding of Eliezer's
approach), and an architecture in which goals are expected to implicitly
emerge as a consequence of RL (which is how I understand your approach, and
Peter Voss's A2I2 approach as well).

In Novamente, we have some simple goals given explicitly from the outset,
but they're more low-level goals than Friendliness; Friendliness to humans
is expected to emerge thru social-interaction-directed RL. (At least that
is the current plan for Novamente; the design and implementation are
flexible, and there is the opportunity for substantial revision before we
get to the stage where Friendliness really matters.)

-- Ben G

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:41 MDT