Re: My attempt at a general technical definition of 'Friendliness'

From: Harvey Newstrom (
Date: Fri Jan 21 2005 - 06:28:56 MST

Sorry to wade in here with such bluntness. But I actually work in an
engineering field that defines vague english words with rigorous
technical definitions to describe emergent properties of systems not
associated with any individual component design. I really use these
definitions to develop and implement technical security architectures.
At some point, the words have to stop being vague english words and
must be given rigorous technical definitions. These definitions must
be sound if the concepts are to be used to actually engineer some
design. I am afraid that this "definition" totally fails to work as a

In security, the field is full of nebulous english words that are like
"friendliness". They describe attributes or aspects of a system that
are not isolated to a single function or object within the
architecture. They are emergent properties that we want to result.
"Friendliness" is no more difficult or technical or nebulous than what
I work with every day. Real engineering methods and techniques will
work here as anywhere. I see no reason to resort to imprecise
definitions, claims of incomputability, or other excuses to explain why
no clear definition can be achieved.

Examples of similar types of hard-to-define terms in my field would

                CONFIDENTIALITY = Access + Privacy, where:
                        Access = access controlability + identification + authentication +
authorization + custody chain
                        Privacy = consent + anonymity
                INTEGRITY = Accountability + Assurance + Capability + Safety, where:
                        Accountability = authenticity + traceability + non-repudiation +
                        Assurance = legality + licensing + compliance + auditability
                        Capability = relevance + suitability + correctness + accuracy +
effectiveness + efficiency + cost
                        Safety = equipment safety + environmental safety + interface safety
                AVAILABILITY = Usability + Reliability + Maintainability +
Compatibility, where:
                        Usability = interconnectivity + accessibility + performance +
productivity + consistency + predictability + understandability +
learnability + operability + localization + persistence
                        Reliability = continuity + maturity + fault tolerance + conciseness
+ openness + survivability + recoverability
                        Maintainability = scalability + changeability + extensibility +
analyzability + stability + testability + modularity + configurability
+ customizability
                        Compatibility = interoperability + integration + portability +
reusability + replaceability + adaptability + installability +

As you can see, all of these are plain english words that describe
happy/fluffy ideas of "good things". But they have all been given
precise and rigorous definitions in the engineering fields. They can
be calculated with mathematical formulas and statistics. They can be
audited. They can be bought and sold as measureable commodities. My
career is based on meeting these terms in ways that can be
mathematically, legally, and realisitically be proven. "Friendliness"
must be similarly defined to a rigorous or precise level. Otherwise,
it can never be approached with real technology or engineering.

> * Proposition: A mind is a utility function. The
> universe itself could be interpreted as a kind of mind
> in the limit that it formed a super-intelligence at an
> Omega Point. Therefore any concept within reality
> could be interpreted as a 'utility function' within
> the universal mind.

These are very controversial and non-obvious propositions on which to
base your definitions. Is friendliness really dependent on the
universe being a "mind"? If I dispute that the universe is a "mind",
does that mean friendliness doesn't exist? Your evidence that all
concepts in reality can be coded as utility functions within an AI is
based on the fact that the universe is a mind which codes all reality
at the omega point? This is more of a religious faith-based assumption
than a basis for an engineering design of an AI.

> The 'Omega Point' condition requires that the rate of
> information processing (computation) within the
> universe approaches infinity as time approaches the
> end. The idea is that a concept is no different to
> the computational function existing in the mind of a
> super-intelligence which generates a list of all
> things possessing the attribute, in the limit that the
> Omega Point condition is approached.

Besides requiring the universe to be a mind, your definitions seem to
require a Tipler-type omega point to occur for your definition. Since
this is unknown and unproven at this point, it sounds like your
definition must be unknown or unproven for now as well. Since the
Omega Point won't occur until the end of the universe, it is unclear
that your explanation applies to anything today. Can't you base your
examples on physics existing now?

> Example: The concept 'Beauty' is defined as being
> equivalent to the mathematical function which
> generates a list of all beautiful things.

This is a circular definition. You defined beauty by using the word
"beautiful" in the definition.

> This is an
> uncomputable function, since beauty appears to a
> prospective attribute: the function to recognize or
> generate beautiful things cannot be finitely
> specified.

I don't like where this is going. We can't develop coherent plans for
achieving something we can't define. I doubt (and hope) that
"friendiness" is not such a function. Otherwise, it boils down to
"friendliness is in the eye of the beholder". You end up saying that
people will call the system friendly if they like it and unfriendly if
they don't. You can't engineer to such a spec, and it ends up being a
democracy with people voting on what they want for friendliness. If
you can't define it precisely, how can it be a requirement? How do you
know it even exists, if you don't know what it is? This isn't some
observation that we haven't pinned down an explanation for yet. This
is our instructions and requirements to people trying to build AI
systems. How can our request be vague and ill-defined, but we'll know
it when we see it?

> But if the Omega Point condition holds for our
> universe, then the function can be defined to be the
> one that a super-intelligence (Universal Mind) would
> hold, in the limit that the rate of information
> processing was approaching infinity (Omega Point). So
> all concepts can be thought of as 'utility functions'
> in the universal mind.

Again, circular logic. You define the universe as mind. Everything is
in the universe (which equals mind, which equals universal mind).
Therefore everything is in this universal mind. Therefore all concepts
are held in this mind. Therefore a concept is what this universal mind
holds. There are no definitions here. You are stating circular
relationships and tautologies that do not distinguish between items
within the definition and items without it.

At the point where the information processing approaches infinity, you
assume it holds everything. Thus, what you say for concepts would
apply to anything and everything else I might want to apply it to.
There is nothing that wouldn't be defined in such a definition of

(On an unrelated tangent, I reject the notion that infinity =
everything. A number-line of intergers from 1, 2, 3....infinity is
infinite, but it does not encode any information besides sequential
numbers. Even though it is an infinite string of digits, there are no
binary pictures, words, messages, concepts or information encoded in
there. This infinite sequence if relatively empty. Infinite
processing may or may not contain much of anything. I have a similar
objection to Many Worlds. Even if their are infinite dimensions, it
does not mean that all conceivable possibilities must exist.)

> * Propositions: All concepts in reality can be
> interpreted as utility functions. 'Friendliness' is a
> concept; therefore Friendliness is a utility function.

This is not a proposition. You are labeling something with a name you
want to call it. This is not the same thing as defining it or
explaining it. A label is not a testable theory. There is no validity
or truth test as to whether these things are what you say or not. You
merely coined a term. Besides using circular logic to reach this
point, you still haven't defined it. You merely labeled it.

> The class of friendly sentients appears to be
> potentially infinite, making 'Friendliness' a
> prospective attribute. Therefore the exact
> Friendliness utility function is uncomputable.

The first part of you sentence says "it appears...", then you jump to a
more assertive "making...." Vague appearances don't make anything
true. This argument is beyond weak. It doesn't actually explain
anything at all.

The last sentence seem to sum up most of your "definition". Instead of
giving a strong definition, you seem to be spending most of your words
giving excuses for the weakness of any definition.

> Therefore all finite approximations to 'Friendliness'
> must have the property that they are recursive and
> converge on the ideal utility function.

The only thing you have defined in the end is that friendliness is
recursive. This is not a definition either. It is an implementation
method for encoding the process toward friendliness. This is about as
useful as defining Bayes Theorem as being mathematical notation. It
tells us how it is implemented or expressed, but tells us nothing about
what you are implementing.

> Let Partial Friendly (PF) = finitely specified
> approximation to the Friendliness function.
> Omega Friendly (OF) = exact Friendliness
> function (uncomputable)
> PF must be a recursive function such that PF (PF)
> outputs PF’ which approaches OF as number of
> iterations approaches infinity.

Assigning variables (or abbreviations) to terms sounds like a lead into
a rigorous definition, like a mathematical formula or technical
specification. But you fizzle off and don't actually use these
abbreviations you define. They sound good, and look rigorous, but
aren't actually used anything. This is about as useful as padding a
glossary with technical words that aren't actually used. It adds

> Definition of Friendliness
> A computable 'Friendly' function (PF) is a function
> which takes any finitely specified function Partial x
> as input and modifies it such that the outputted
> function Partial x' is a better approximation to Omega
> x. Successive output used as input for the next
> iteration has to cause Partial x' to converge on Omega
> x as the number of iterations approaches infinity.

This definition of "friendliness" merely says it is will have a
recursive implementation. There is no definition here. You have
described one attribute (recursiveness) of another attribute
(friendliness) without defining that other attribute. There is also no
measurement method of friendliness here, to calculate how friendly we
are getting. Nor is there a test to define whether we are friendly or

Any recursive function, such as factorial, would seem to meet your
definition above. As such, it fails to define or distinguish between
friendly and non-friendly items.

> None the less, I think my understanding has improved
> to the point where I at least understand in general
> terms what the concept 'Friendliness' mean. So here
> I've had a go at attempting to give, in my own words,
> a technical definition of 'Friendliness'. Admittedly
> it's only a very general definition, but I don't think
> it's vacuous or gibberish. That is, I think I'm
> actually defining something meaningful here. If you
> disagree by all means please tell me.

No offense, but I think this is close to gibberish. It consists of
tautologies that each might make sense, but they go nowhere. You also
contradict yourself by claiming to give a "technical definition" but
only as a "very general definition". It can't be both the technical or
rigorous definition and general at the same time. You can't give a
"technical definition" in your own words. Not only have you failed to
give a definition (IMHO), but you are not even clear about the goals or
what you are claiming to present here.

I see no or few definitions here at all. I am not even sure if you and
I would agree on what a definition is or what it is supposed to do.
But by my understanding, you are not defining, and what you are doing
fails to achieve any goal of a definition. Maybe you are trying to
invent terminology instead? Maybe you are giving a background or
context in which friendliness must exist? Maybe you are describing
goals, principles, constraints, assumptions, methodology, or some other
attribute besides definition of this concept?

I would be curious to see Eliezer's take on this "definition" of
friendliness. If you gave it to him using a term other than
"friendliness", would he even recognize it as describing the same thing
that he frequently talks about?

Again, please excuse my bluntness. But if I were editing your work,
this would be my critique. If I were an engineer handed such a
definition or requirement, this would be my response. And please
realize that I am not just complaining about it not being in the right
form or terminology that I like to see. In my career, I work with
customers all the time translating their vague notions into rigorous
engineering specs. But I cannot extract any useful guidance out of
what you have written. I have a similar problem with much AI and
transhumanist writings, which tend to be more philosophy than anything
else. In many cases, it isn't even philosophy, it is just attitude.

Harvey Newstrom <>

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:50 MDT