Re: Two draft papers: AI and existential risk; heuristics and biases

From: Eliezer S. Yudkowsky (
Date: Tue Jun 06 2006 - 13:14:54 MDT

Bill Hibbard wrote:
> Eliezer,
> In Section 6.2 you quote my ideas written in 2001 for
> hard-wiring recognition of expressions of human happiness
> as values for super-intelligent machines. I have three
> problems with your critique:


First, let me explain why the chapter singles you out for criticism,
rather than any number of AI researchers who've made similar mistakes.
It is because you published the particular comment that I quoted in a
peer-reviewed AI journal. The way I found the quote was that I read the
online version of your book, and then looked through your journal
articles hoping to find a quotation that put forth the same ideas. I
specifically wanted a journal citation. The book editors specifically
requested that I quote specific persons putting forth the propositions I
was arguing against. In most cases, I felt I couldn't really do this,
because the arguments had been put forth in spoken conversation or on
email lists, and people don't expect emails composed in thirty minutes
to be held to the same standards as a paper submitted for journal

Before discussing the specific issues below, let me immediately state
that if you write a response to my critique, I will, no matter what else
happens in this conversation, be willing to include a link to it in a
footnote, with the specific note that you feel my criticism is
misdirected. I may also include a footnote leading to my response to
your response, and you would be able to respond further in your previous
URL, and so on.

Space constraints are a major issue here. I didn't have time to discuss
*anything* in detail in that book chapter. If we can offload this
discussion to separate webpages, that is a good thing.

> 1. Immediately after my quote you discuss problems with
> neural network experiments by the US Army. But I never said
> hard-wired learning of recognition of expressions of human
> happiness should be done using neural networks like those
> used by the army. You are conflating my idea with another,
> and then explaining how the other failed.

Criticizing an AI researcher's notions of Friendly AI is, typically, an
awkward issue, because obviously *they* don't believe that their
proposal will destroy the world if somehow successfully implemented.
Criticism in general is rarely comfortable.

There's a number of "cheap" responses to an FAI criticism, especially
when the AI proposal has not been put forth in mathematical detail -
i.e., "Well, of course the algorithm *I* use won't have this problem."
Marcus Hutter's is the only AI proposal sufficiently rigorous that he
should not be able to dodge bullets fired at him in this way. I'd have
liked to use Hutter's AIXI as a mathematically clear example of a
similar FAI problem, but that would have required far too much space to
introduce; and my experience suggests that most AI academics have
trouble understanding AIXI, let alone a general academic audience.

You say, "Well, I won't use neural networks like those used by the
army." But you have not exhibited any algorithm which does *not* have
the problem cited. Nor did you tell your readers to beware of it. Nor,
as far as I can tell from your most recent papers, have you yet
understood the problem I was trying to point out. It is a general
problem. It is not a problem with the particular neural network the
army was using. It is a problem that people run into, in general, with
supervised learning using local search techniques for traversing the
hypothesis space. The example given is one that is used to vividly
illustrate this general point - it's not to warn people against some
particular, failed neural network algorithm. I don't think it
inappropriate to cite a problem that is general to supervised learning
and reinforcement, when your proposal is to, in general, use supervised
learning and reinforcement. You can always appeal to a "different
algorithm" or a "different implementation" that, in some unspecified
way, doesn't have a problem. If you have magically devised an algorithm
that avoids this major curse of the entire current field, by all means
publish it.

> 2. In your section 6.2 you write:
> If an AI "hard-wired" to such code possessed the power - and
> [Hibbard, B. 2001. Super-intelligent machines. ACM SIGGRAPH
> Computer Graphics, 35(1).] spoke of superintelligence - would
> the galaxy end up tiled with tiny molecular pictures of
> smiley-faces?
> When it is feasible to build a super-intelligence, it will
> be feasible to build hard-wired recognition of "human facial
> expressions, human voices and human body language" (to use
> the words of mine that you quote) that exceed the recognition
> accuracy of current humans such as you and me, and will
> certainly not be fooled by "tiny molecular pictures of
> smiley-faces." You should not assume such a poor
> implementation of my idea that it cannot make
> discriminations that are trivial to current humans.

Oh, so the SI will know "That's not what we really mean."

A general problem that AI researchers stumble into, and an attractor
which I myself lingered in for some years, is to measure "stupidity" by
distance from the center of our own optimization criterion, since all
our intelligence goes into searching for good fits to our own criterion.
  How stupid it seems, to be "fooled" by tiny molecular smiley faces!
But you could have used a galactic-size neural network in the army tank
classifer and gotten exactly the same result, which is only "foolish" by
comparison to the programmers' mental model of which outcome *they*
wanted. The AI is not given the code, to look it over and hand it back
if it does the wrong thing. The AI *is* the code. If the code *is* a
supervised neural network algorithm, you get an attractor that
classifies most instances previously seen. During the AI's youth, it
does not have the ability to tile the galaxy with tiny molecular
pictures of smiling faces, and so it does not receive supervised
reinforcement that such cases should be classifed as "not a smile". And
once the AI is a superintelligence, it's too late, because your frantic
frowns are outweighed by a vast number of tiny molecular smileyfaces.

In general, saying "The AI is super-smart, it certainly won't be fooled
by foolish-seeming-goal-system-failure X" is not, I feel, a good response.

I realize that you don't think your proposal destroys the world, but I
am arguing that it does. We disagree about this. You put forth one
view of what your algorithm does in the real world, and I am putting
forth a *different* view in my book chapter.

As for claiming that "I should not assume such a poor implementation",
well, at that rate, I can claim that all you need for Friendly AI is a
computer program. Which computer program? Oh, that's an implementation
issue... but then you do seem to feel that Friendly AI is a relatively
easy theoretical problem, and the main issue is political.

> 3. I have moved beyond my idea for hard-wired recognition of
> expressions of human emotions, and you should critique my
> recent ideas where they supercede my earlier ideas. In my
> 2004 paper:
> Reinforcement Learning as a Context for Integrating AI Research,
> Bill Hibbard, 2004 AAAI Fall Symposium on Achieving Human-Level
> Intelligence through Integrated Systems and Research
> I say:
> Valuing human happiness requires abilities to recognize
> humans and to recognize their happiness and unhappiness.
> Static versions of these abilities could be created by
> supervised learning. But given the changing nature of our
> world, especially under the influence of machine
> intelligence, it would be safer to make these abilities
> dynamic. This suggests a design of interacting learning
> processes. One set of processes would learn to recognize
> humans and their happiness, reinforced by agreement from
> the currently recognized set of humans. Another set of
> processes would learn external behaviors, reinforced by
> human happiness according to the recognition criteria
> learned by the first set of processes. This is analogous
> to humans, whose reinforcement values depend on
> expressions of other humans, where the recognition of
> those humans and their expressions is continuously
> learned and updated.
> And I further clarify and update my ideas in a 2005
> on-line paper:
> The Ethics and Politics of Super-Intelligent Machines

I think that you have failed to understand my objection to your ideas.
I see no relevant difference between these two proposals, except that
the paragraph you cite (presumably as a potential replacement) is much
less clear to the outside academic reader. The paragraph I cited was
essentially a capsule introduction of your ideas, including the context
of their use in superintelligence. The paragraph you offer as a
replacement includes no such introduction. Here, for comparison, is the
original cited in AIGR:

> "In place of laws constraining the behavior of intelligent machines, we need to give them emotions that can guide their learning of behaviors. They should want us to be happy and prosper, which is the emotion we call love. We can design intelligent machines so their primary, innate emotion is unconditional love for all humans. First we can build relatively simple machines that learn to recognize happiness and unhappiness in human facial expressions, human voices and human body language. Then we can hard-wire the result of this learning as the innate emotional values of more complex intelligent machines, positively reinforced when we are happy and negatively reinforced when we are unhappy. Machines can learn algorithms for approximately predicting the future, as for example investors currently use learning machines to predict future security prices. So we can program intelligent machines to learn algorithms for predicting future human happiness, and use those predicti
ons as emotional values."

If you are genuinely repudiating your old ideas and declaring a Halt,
Melt and Catch Fire on your earlier journal article - if you now think
your proposed solution would destroy the world if implemented - then I
will have to think about that a bit. Your old paragraph does clearly
illustrate some examples of what not to do. I wouldn't like it if
someone quoted _Creating Friendly AI_ as a clear example of what not to
do, but I did publish it, and it is a legitimate example of what not to
do. I would definitely ask that it be made clear that I no longer
espouse CFAI's ideas and that I have now moved on to different
approaches and higher standards; if it were implied that CFAI was still
my current approach, I would be rightly offended. But I could not
justly *prevent* someone entirely from quoting a published paper, though
I might not like it... But it seems to me that the paragraph I quoted
still serves as a good capsule introduction to your approach, even if it
omits some of the complexities of how you plan to use supervised
learning. I do not see any attempt at all, in your new approach, to
address any of the problems that I think your old approach has.
However, I could not possibly refuse to include a footnote disclaimer
saying that *you* believe this old paragraph is no longer fairly
representative of your ideas, and perhaps citing one of your later
journal articles, in addition to providing the URL of your response to
my criticisms.

If you are repudiating any of your old ideas, please say specifically
which ones.

If anyone on these mailing list would like to weigh in with an outside
opinion of what constitutes fair practice in this case, please do so.

Eliezer S. Yudkowsky                
Research Fellow, Singularity Institute for Artificial Intelligence

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:56 MDT