Double illusion of transparency (was Re: Building a friendly AI from a "just do what I tell you" AI)

From: Tim Freeman (
Date: Tue Dec 04 2007 - 22:13:55 MST

> 2007/11/21, Tim Freeman < <> >:
> ... Unfortunately the paper
> needs revision and hasn't yet made sense to someone who I didn't
> explain it to personally...

Joshua Fox wrote:
> The paper is, in fact, very clearly written...

From: "Eliezer S. Yudkowsky" <>

The Life of Brian clip cited in Eliezer's essay there is wonderful and
deserves its own link:


Elizer's point is that it's entirely possible for Joshua to think he
understands me, and me to listen to Joshua's restatements of what I
said, and we each misunderstand the other while the misunderstandings
lead us to think the communication is perfect.

That's exactly how it happened when I had to explain it personally.
Based on reading the paper my reviewer thought I was implicitly having
a training phase and then a performance phase, whereas in fact I
intended the whole thing to be retrained on the entire memorized past
for every timestep because I'm hypothesizing unlimited computing
power. There were a few other misunderstandings like that too. It
left him with a model of what I was trying to say that was quite
broken and was not much helped by fixing any one misunderstanding. We
stumbled around for a while before clearing it up.

Tim Freeman      

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:01:01 MDT