Why friendly AI (FAI) won't work

From: Harry Chesley (chesley@acm.org)
Date: Wed Nov 28 2007 - 09:49:39 MST

While I'm still open to being convinced otherwise, my current belief is
that friendly AI is doomed to failure. Let me explain why and see if it
convinces anyone else, or if they can produce strong counter-arguments.

The whole situation reminds me of another movement in computer science
that was a hot topic when I started out in the '70s: proving programs
correct. Although there were many great ideas and many great thinkers
involved, that movement failed to have any substantial impact of the
real world for three reasons: it was hard to impossible to implement,
people didn't really need it, and a specification is not necessarily
less error prone than an implementation. All of these reasons apply to
FAI as well.

First, to be useful, FAI needs to be bullet-proof, with no way for the
AI to circumvent it. This equates to writing a bug-free program, which
we all know is next to impossible. In fact, to create FAI, you probably
need to prove the program correct. So it's unlikely to be possible to
implement FAI even if you figure out how to do it in theory.

Second, I believe there are other ways to achieve the same goal,
rendering FAI an unnecessary and onerous burden. These include
separating input from output, and separating intellect from motivation.
In the former, you just don't supply any output channels except ones
that can be monitored and edited. This slows things down tremendously,
but is much safer. In the later, you just don't build in any motivations
that go outside the internal analysis mechanisms, including no means of
self-awareness. In essence, design it so it just wants to understand,
not to influence. This may be as prone to error as FAI, but is simpler
to implement and therefore more likely to be successful. (Indeed, any
solution can be argued to be impossible to implement due to the near
certainty of bugs, but in general the simpler they are, the more likely
they are to be workable.)

Third, defining FAI is as bug-prone as implementing it. One small
mistake in the specification, either due to lack of foresight or human
error (say, a typo), and it's all for nothing. And, in general, it's
hard to correctly specify a solution without having the same context as
that of the implementors of the solution, which is this case is
equivalent to saying that you have the same perspective as the AI, which
you don't.

To save everyone from having to read more postings, let me pre-supply
some of the replies I'm sure to get to this message:

* Read the literature!

* You don't understand the problem.

* You're an idiot.

I don't disagree with any of those, actually, but I'm only likely to be
convinced I'm wrong by arguments that address my points directly.

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:01:01 MDT