Re: Why friendly AI (FAI) won't work

From: Byrne Hobart (
Date: Wed Nov 28 2007 - 10:31:30 MST

> First, to be useful, FAI needs to be bullet-proof, with no way for the
> AI to circumvent it. This equates to writing a bug-free program, which
> we all know is next to impossible. In fact, to create FAI, you probably
> need to prove the program correct. So it's unlikely to be possible to
> implement FAI even if you figure out how to do it in theory.

Does it need to be perfect, rather than 1) better than previous versions, 2)
able to recognize errors, and 3) highly redundant? For example, your AI
could be motivated to ensure Kaldor-Hicks efficient transfers of wealth, *
and* to ensure maximally beneficial transfers of wealth -- and if it finds
that Goal #2 is interfering with Goal #1, it would drop the second goal
until it could come up witha a better way to fulfill it without hurting Goal
#1. I mean, if you're designing a system for, say, routing trains, you don't
need a perfect Get Things There Fast routine, as long as you have a very
high-priority, low-error Don't Crash Things Into Each Other routine.

> Second, I believe there are other ways to achieve the same goal,
> rendering FAI an unnecessary and onerous burden. These include
> separating input from output, and separating intellect from motivation.
> In the former, you just don't supply any output channels except ones
> that can be monitored and edited.

Monitored and edited by whom? This dragon-on-a-leash theory presupposes that
we can pick the right leash-holder, and ensure that the leash stays where we
want it. That's very nearly incompatible with the notion that an AI is
valuable enough to be worth creating and powerful enough to make a

> Third, defining FAI is as bug-prone as implementing it. One small
> mistake in the specification, either due to lack of foresight or human
> error (say, a typo), and it's all for nothing. And, in general, it's
> hard to correctly specify a solution without having the same context as
> that of the implementors of the solution, which is this case is
> equivalent to saying that you have the same perspective as the AI, which
> you don't.

Again, error correction can help, here. As long as the AI thinks that
not-harming (or harming-only-while-compensating) is way, way more important
than doing good, it's likely to be a net win even if a buggy
benefit-creating function could, if executed, cause harm.

(resent; the original reply-to address was, which claims not
to have worked)

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:01:01 MDT