Re: QUES: CFAI [#2]

From: Eliezer S. Yudkowsky (
Date: Thu Sep 19 2002 - 13:58:03 MDT

Anand wrote:
> 01. What cognitive processes may allow for altruism?

Can I zoom in on this question a bit? Do you mean, why does altruism
evolve? Do you mean, are there specific known neuroanatomical regions
whose activation is associated with altruism? Do you mean, what kind of
cognitive processes may be involved with altruism aside from regular,
evolved emotional tones?

> 02. What does it mean for a cognitive process to allow for altruism?

You used the term "allow for altruism", right? Or did I use it somewhere
and forget it? Anyway, I'd say that it means that this cognitive process
plays a role in an altruistic mind or that this cognitive process does not
actually *rule out* altruism. But I really don't know what you mean by
this. Do you mean *supports* altruism or *plays a role in human altruism*?

> The following are prompted by recent difficulty in explaining aspects
> of Friendliness. (Apologies if some of these are wrong questions.)
> 03. "...for any sort of diverse audience, humans generally use the
> semantics of objectivity, by which I mean that a statement is argued to
> be 'true' or 'false' without reference to data that the
> audience/persuadee would cognitively process as 'individual.'"
> (
> What are the implications of the semantics of objectivity?

Um... that, ceteris paribus and lacking complex philosophical reasons to
do otherwise, humans tend to argue with each other as if moral
propositions were facts? Because that's the first way evolution stumbled
on to represent moral propositions in declarative discourse, which
development would tend to become evolutionary fixed by the way that it
makes moral propositions easy to communicate between humans?

> 04. "Thus, when humans talk about 'morality,' we generally refer to
> the body of cognitive material that uses the semantics of objectivity."
> (
> What composes said body of cognitive material?

Example: "We hold these truths to be self-evident..."

> 05. What are examples of non-good/non-bad Gaussian abilities that
> ground in panhuman characteristics?

Uh... what? Can you rephrase that without using my own jargon? In
particular, I use the jargon "gaussian ability" to refer to abilities
which, among humans, obey some kind of bellish distribution, of which the
gaussian distribution is the most common kind. A panhuman characteristic
is one that is species-typical. I'm not sure why these two terms are
appearing in the same sentence above.

> 06. What are examples of good or bad Gaussian abilities that ground in
> panhuman characteristics and that the shaper network will recognize?

Same problem - why are those two terms appearing in the same sentence?

> 07. "The renormalizing shaper network should ultimately ground itself
> in the panhuman and gaussian layers, without use of material from the
> personality layer of the original programmer."
> (
> What does "material" refer to? Please give examples of material in the
> panhuman and gaussian layers that the shaper network will use.

An example of panhuman material would be the complex functional adaptation
of "sympathy", constituting both the cognitive ability to put yourself in
someone else's shoes, and the emotional affects of doing so, particularly
with respect to judging fairness in both moral and metamoral arguments.

An example of a gaussian characteristic that would be modeled as a shaper,
and incrementally nudged toward an extreme, would be the internal
intensity of some of the emotional tones contributing to altruism. Note
that the tones themselves are panhuman and their intensity is gaussian.

An example of something on the personality layer which should never be
treated as grounding is Eliezer's fondness for the book Godel, Escher,
Bach. Cognitive modules that play a role in my liking GEB might be
transferred over. The interim fact that I like GEB might be used as
evidence to get at those cognitive modules. The fact that I actually like
GEB shouldn't ever play a role in bottom-level Friendliness.

> 08. "...requires all the structural Friendliness so far described, an
> explicit surface-level decision of the starting set to converge,
> prejudice against circular logic as a surface decision, protection
> against extraneous causes by causal validity semantics and surface
> decision,...")
> (
> What does "surface decision" mean?

It means the final decision of the entire current Friendliness system with
respect to some particular choice point. In the above, it means that the
initial Friendliness system has to be such that, if the AI is presented
with a piece of extraneous circular logic, it is already capable of saying
"Well, that's circular logic, and right now, at least, I think that
circular logic is a bad thing, so I'm not going to accept it, at least at
the moment, although I might change my mind about circular logic later."

> 09. What is the best way to determine whether an action is Friendly,
> and why? What is the _last_ way, and why? (Prehuman AI, Infrahuman
> AI)

You have your current understanding of Friendliness - your current set of
definitions for figuring out how Friendly something is likely to be - and
you have your programmers, whom you can consult if you can cross the
communications gap. What else is there?

> 10. What are your present predictions on why Friendly AI will fail?


Or do you mean, what are the most likely reasons FAI might fail?

Those are:

1) Because seed AI is way, way easier than anyone expects. The hard
takeoff trajectory is such that it's possible to do a hard takeoff using
little more than Lenatian complexity, meaning that the AI must spend a
very extended period in controlled ascent and cooperative ascent without
having any substantial base of understanding in Friendliness. Then the
controlled ascent mechanism fails at sufficiently many points that a
stratospheric ascent begins and is not detected.

2) Because the FAI is just more alien than the programmers can figure out
how to deal with, and the programmers don't realize their own
incompetence. This interacts strongly with (1) above, since otherwise,
how the heck did you grow something that alien to the point where it could
undergo takeoff? Incidentally, there are really alien things you can do
to *support* Friendliness.

3) The AI builders screw up their basic understanding of Friendliness,
and this only becomes apparent after the AI is past the point of no return.

4) The first AI is built by a project that doesn't care enough about
Friendliness. Also interacts with (1).

> 11. What have you studied or what are you studying for Friendliness
> content?

Study? People don't *write* about this stuff. At least not that I know

> 12. What recent progress have you made and what progress do you need
> to make on Friendliness content?

Most recently: Figuring out microtasks that could be used to teach
an AI an incremental understanding of Friendliness, and more importantly,
meta-Friendliness. Probabilistic grounding semantics (where a system
tries to figure out what it's an imperfect approximation to). Using a
flawed but redundant definition of "correction" to correct a flawed but
redundant definition of "correction".

> 13. What are the next steps for Friendly AI theory?

A more detailed model of Friendliness content as well as Friendliness
structure. Adapting the theory to an early infrahuman mind. You might
say that CFAI argues that human-equivalent minds and superintelligences
*can* be Friendly. From there you have to go on to figure out how an
infrahuman mind actually does understand Friendliness.

Eliezer S. Yudkowsky
Research Fellow, Singularity Institute for Artificial Intelligence

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:41 MDT