Re: [sl4] Friendly AIs vs Friendly Humans

From: Brian Rabkin (
Date: Tue Jun 21 2011 - 02:54:39 MDT

*"Unfriendly AI" *

When people use this term I don't know what they mean, since they could mean
malevolent AIs (which are presumably as rare as friendly ones), they could
mean to include indifferent AIs that happen to be ruinous, or they could
also mean to include indifferent AIs that happen to not cause problems.

*"From what I've overheard, one of the biggest difficulties with FAI is*
*that there are a wide variety of possible forms of AI, making it
difficult to determine what it would take to ensure Friendliness for
*any potential AI design."*

I don't think that's a problem. Even if the vast majority of possible minds
had favorable and nurturing (etc.) intentions towards humanity, if when they
grew/learned/lived/modified their goals it was not predictable that they'd
keep their approach to humans, eventually they'd stumble into indifference
or malevolence towards us. So the unpredictability of goal development is
sufficient to cause the problem. Were goal development predictable and
stable, then the problem of making the first iteration nice would arise, but
I don't think that would be particularly hard compared to the whole rest of
building a synthetic mind, even with most possible minds indifferent to

*"human-like minds"*

This is unspecific enough that I don't have confidence I know what you're
saying. However, if you mean make a mind similar to human ones as a means
for it to be easier for it to have empathy with us, I see a few problems.

1) A broad category of attempted solutions can be described as "limit the
AI". Limit the AI's goals, limit the AI's mental organization, etc. These
fail in two ways: a) the limitations on the AI can be overcome by the AI
with enough thinking, b) the limitations on the AI weaken it such that it
can't reliably prevent a subsequent, unconstrained AI from being a
problematic AI.

The problem is an AI can alter itself/its mind and its goals, if not your
AI, another's. Imagine: none of this kenken to strengthen connections
between parts of the brain, the AI can decide to make internal associations
stronger, weaker, etc., split systems as soon as it sees it would be optimal
to have a pair of specialists rather than a general one, not merely
repurpose evolved senses to new challenges like evolution but design its own
solutions, etc.

2) It seems you're trying to invoke an aspect of mind that is an example of
what an AI will be very unlikely to have: tribalism of a sort that we have
because of our evolutionary history. 'You're similar to me-->you're in my
in-group-->I will treat you well, and you have the same mind structure so I
benefit': nothing like that is at all necessary as a component of a mind
made to think inductively. I don't know how much pure game theory would
impact likely AIs to be similar to us.

*"If not, does that imply that there*
*is no general FAI solution?"*

Not at all strongly. It's just a case of one approach not working, but it
isn't the the one I would have thought most likely to succeed. In general
one can test if any of several solutions will work by trying the best one
first, but that works less well the more difference in kind there is among
solutions, even when one has actually selected the best one first.

E.g. if I have a ranking of the usefulness of characters in an RPG in which
I can choose only my fifth party member, If my four core members and the guy
at the top of the list wipe out at a boss fight, and the guy at the top of
the list was a fighter, I may yet be able to beat the boss by selecting a
mage lower down the list, even though the list is organized by general
strength. If I find out I was reading the list wrongly, and the character at
the top was actually the weakest character generally, then I really know my
wipe out isn't indicative of much and by reloading I may yet beat it.
Failing with a character indicates I am likely to fail with weaker or
similar characters, but characters both stronger and different in kind give
me an excellent chance.

So it seems to me.

On Tue, Jun 21, 2011 at 2:36 AM, DataPacRat <> wrote:

> Since this list isn't officially closed down /quite/ yet, I'm hoping
> to take advantage of the remaining readers' insights to help me find
> the answer to a certain question - or, at least, help me find where
> the answer already is.
> My understanding of the Friendly AI problem is, roughly, that AIs
> could have all sorts of goal systems, many of which are rather
> unhealthy for humanity as we know it; and, due to the potential for
> rapid self-improvement, once any AI exists, it is highly likely to
> rapidly gain the power required to implement its goals whether we want
> it to or not. Thus certain people are trying to develop the parameters
> for a Friendly AI, one that will allow us humans to continue doing our
> own things (or some approximation thereof), or at least for avoiding
> the development of an Unfriendly AI.
> From what I've overheard, one of the biggest difficulties with FAI is
> that there are a wide variety of possible forms of AI, making it
> difficult to determine what it would take to ensure Friendliness for
> any potential AI design.
> Could anyone here suggest any references on a much narrower subset of
> this problem: limiting the form of AI designs being considered to
> human-like minds (possibly including actual emulations of human
> minds), is it possible to solve the FAI problem for that subset - or,
> put another way, instead of preventing Unfriendly AIs and allowing
> only Friendly AIs, is it possible to avoid "Unfriendly Humans" and
> encourage "Friendly Humans"? If so, do such methods offer any insight
> into the generalized FAI problem? If not, does that imply that there
> is no general FAI solution?
> And, most importantly, how many false assumptions are behind these
> questions, and how can I best learn to correct them?
> Thank you for your time,
> --
> DataPacRat
> lu .iacu'i ma krinu lo du'u .ei mi krici la'e di'u li'u traji lo ka
> vajni fo lo preti

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:01:05 MDT