From: Ben Goertzel (firstname.lastname@example.org)
Date: Thu Jun 27 2002 - 23:04:17 MDT
First of all, as I stressed in my previous e-mail, that brief essay entitled
"Thoughts on AI Morality" is exactly that -- some thoughts on AI morality.
It is not a systematic manifesto outlining the Novamente project's complete
approach to AI morality. Maybe I will write such a thing -- *after* I'm
done with this damnable Novamente book project. In other words, my essay is
much shorter and much less ambitious than Eliezer's CFAI document.
Furthermore the thoughts in it are probably much more amenable to change
than Eli's ideas in CFAI, because I have definitely not spent as much time
on these issues as Eli has.
I'm sorry I didn't write the Novamente AI Morality manifesto yet... but,
y'know, perhaps if I were being personally funded by a philanthropist, who
wanted me to write about this stuff, then I would have ;) . As it is, I'm
focusing on actually creating the AGI, both
-- because I think that's the most important thing for me to focus on, and
-- because I'm counting on the biological-data-analysis applications of
Novamente to feed myself, my wife and my 3 kids over the next few years.
Writing a manifesto on AI Morality is not going to feed the family, AND it's
also not going to do the world any more good than writing it a year or two
from now, when Novababy is ready to be seriously taught....
I'm certainly not retracting the essay -- the thoughts in the "Thoughts on
AI Morality" essay are all things I still agree with perfectly well (though
I might place different emphases today), but they do not reflect my complete
opinions and thoughts on the topic. The discussions on this list have
brought out other opinions I have on the topic, that I did not put in that
Now you cite some quotes from me:
> "It does not seem at all strange to me to partially rely on the advice of
> an appropriate group of others, when making an important
> decision. It seems
> unwise to me *not* to." - Ben
> "c) whomever creates an AGI, intrinsically has enough wisdom that
> they should
> be trusted to personally decide the future of the human race"
> Ben claiming
> this is what Eliezer believes (and implicitly what Ben does not believe?)
> do not seem to match up with the sentiment here:
> "But intuitively, I feel that an AGI with these values is going to be a
> positive force in the universe – where by “positive” I mean “in
> with Ben Goertzel’s value system”." - Ben's idea of how an AI figures out
> what's "right"
I don't see the mismatch here, Brian.
I do feel it's likely that an AGI with the values I described in my paper,
would probably be a positive force in the universe according to my own value
And, I feel that the decision of whether to halt or proceed with an AI
project, once it gets to an advanced phase, is too big for just me to make;
I would certainly want to consult others whose opinion I respected at that
What's the inconsistency? That I did not say 2) in the essay you're
referring to? It was just a quick essay, not a systematic manifesto. You
cannot rightly assume that I disagree with all things that are NOT said in
Certainly, these discussions with Eli, you, James Higgins and others are
stimulating and productive, and will have an effect on what I finally say
when I do write that "Novamente Friendly AI Manifesto", when I feel the time
> Another interesting quote from Ben's AI Morality paper to give us all a
> warm fuzzy feeling:
> "What happens when the system revises itself over and over again,
> its intelligence until we can no longer control or understand
> it? Will it
> retain the values it has begun with? Realistically, this is
> anybody’s guess!
> My own guess is that the Easy values are more likely to be
> retained through
> successive drastic self-modifications – but the Hard values do
> have at least
> a prayer of survival, if they’re embedded in a robust value dual
> network with
> appropriate basic values at the top. Only time will tell."
And that, Brian, is the straight truth as I see it.
Realistically, we don't know how any Friendliness approach is going to work
out in any given AGI system, until we have
1) built and experimented with the AGI system, or
2) created a really solid mathematical model of the AGI system and its
behavior in real-world conditions
Now, modern math seems not to be up to 2), so barring an immense
breakthrough in that regard, that leaves us with 1).
About how Eli's Friendliness architecture will work out in his (yet to be
defined in detail) AGI system, I'd also say: Only time will tell. His
somewhat plausible logical arguments are very far from a mathematical proof
that his Friendliness architecture will work in his AGI system. In
particular, I am not the only one to ask him "Why do you think the Friendly
goal system will really be stable under self-modifications?" and not get an
answer that seems satisfactory.... If he had
1) built and experimented with his AGI system, or
2) created a solid math. theory of his AGI system and its Friendliness
then he could answer this question in a way nearly any reasonable person
would accept. Lacking either 1) or 2), it's all speculative conjecture....
Like it or lump it!
> >From my quick read his FAI ideas boil down to hardcoding part of his
> morality into his AI, and training it to know about the rest, and then
> turning it loose hoping it somehow sticks to it.
My initial approach will indeed be to hardcode part of my morality into my
AI, and train it to know about the rest, and see what happens -- while the
AI is at a profoundly infrahuman, Novababy stage.... A lot of different
configurations will be experimented with, maybe even a CFAI-style
hierarchical goal system.... This will be a learning phase. During this
phase, some simple protections will be built in to protect against the
extremely unlikely event of an unexpected hard takeoff -- but I feel the
correct design of these protections will only be determinable via
experimentation during this same phase.
About my plan being "turning it loose hoping it somehow sticks to it",
that's not quite right. I think I've clarified that in this e-mail. Of
course, if the AGI seems to be moving in a bad direction, it should be
halted; and though I didn't say so in the essay, of course I intend to
involve others than myself in this decision.
You may feel it is irresponsible to proceed with AGI development without
having systematically written down my approach to AI morality. To restate
my view yet again, however, I feel that AI morality should be approached
largely as an empirical science, and that attempts to make grandiose and
definitive statements or theories about it prior to actually having an AGI
to play with, are going to be largely a waste of time.
I feel the same way about theories of computer consciousness for example.
There are loads and loads of papers by academics on computer consciousness.
The *real* study of computer consciousness, however, is going to start when
we have AGI's that display consciousness-like behaviors to play with! AT
that point, all the writings on computer consciousness from the previous
period, are going to seem kind of silly...
So far, based on these disussions, when i rewrite the "AI Morality" essay I
will add in the following points, with elaborated explanations:
AI morality and AI consciousness are only going to become scientific when
they are pursued as *experimental science*, and we're just not there yet...
until then it's just entertaining, thought-stimulating conjecture...
It's important to put in protections against unexpected hard takeoff, but
the effective design of these protections is hard, and the right way to do
it will only be determined thru experimentation with actual AGI systems
(again, experimental science)
Yes, it is a tough decision to decide when an AGI should be allowed to
increase its intelligence unprotectedly. A group of Singularity wizards
should be consulted, it shouldn't be left up to one guy.
MAYBE I will also replace the references to my own personal morality with
references to some kind of generic "transhumanist morality." However, that
would take a little research into what articulations of transhumanist
morality already exist. I know the Extropian stuff, but for my taste, that
generally emphasizes the virtue of compassion far too little....
What I will not do in any revision of the essay -- except one written after
significant experimentation with a Novababy has been done -- is introduce
any definitive statement that one or another particular approach to ensuring
Friendliness, or measuring intelligence increase, is likely to be effective.
I feel there is just too much uncertainty in these regards, at this stage.
Are there any other significant issues that you think should be addressed in
the revision, Brian? Knowing what you know now about my overall point of
view on these matters?
-- Ben G
This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:39 MDT