Ben's _Thoughts on AI Morality_

From: Eliezer S. Yudkowsky (
Date: Mon May 06 2002 - 22:41:58 MDT

I'm not doing a complete commentary just yet, but one thing that jumped out
at me as new to our respective discussions was:

"One aspect of the dual network structure will be important for the
discussion of value systems to be presented here: generally, in a dual
network, the higher-up elements in the hierarchy are more abstract."

This theme, as I understand it, is developed into the idea that an AI's
supergoals ("basic values") will be very abstract, while its subgoals
("derived values") will be very concrete.

Having read through the Novamente manuscript with all the little arrows
showing the dominance of simpler patterns over complex patterns, I think I
understand where Ben gets this intuition from, but I disagree; I think this
conflates event structure and category structure. It's very traditional to
view category structure as a hierarchy with more abstract concepts at the
"top", although whether this convention reflects real properties of the
system is debatable. However, goals tend to center around neither very
abstract categories nor very concrete categories, but rather an intermediate
level called basic-level categories. Basic-level categories have a wide
range of interesting properties; they tend to be the most abstract
categories for which you can still call up a specific mental image, the most
abstract categories such that you interact with all instances of the
category using the same motor routines, and so on. Usually, the view is
that this convergence of easy visualization and sensorimotor support is what
makes "basic-level" categories useful enough to be ubiquitous subgoals. I
would argue that we should consider a dual perspective in which (a) we tend
to choose basic-level subgoals because basic-level subgoals are easy to
think about and (b) we have evolved as organisms so that the most adaptively
useful regularities in event structure are basic-level categories relative
to our cognitive systems.

So I would say that Friendliness content is (a) often quite easy to think
about as a basic-level category on our own terms, and that (b) even if
Friendliness content were not made up mostly of basic-level categories from
a human perspective, the AI might soon develop (both naturally, and with
programmer assistance) into a mind that saw the key concepts as basic-level
categories. Even if this is not the case, I see no great difficulty in
stabilizing a Friendliness system where the central content is made up
mostly of highly abstract categories, so long as these abstract categories
are still subject to experiential feedback (even if it's as simple as the
programmer hitting a yes/no button) and so long as the abstract categories
are anchored by known relations to basic-level and concrete categories with
strong sensorimotor support.

-- -- -- -- --
Eliezer S. Yudkowsky
Research Fellow, Singularity Institute for Artificial Intelligence

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:38 MDT