AI-Box Experiment #5: D. Alex, Eliezer Yudkowsky

From: Eliezer S. Yudkowsky (sentience@pobox.com)
Date: Sat Aug 27 2005 - 14:11:40 MDT


AI-Box Experiment #5 will take place today between D. Alex and Eliezer
Yudkowsky. Stakes are $5000 against $50. Minimum time 3 hours.

By request of D. Alex, I am posting some of the relevant preliminary
discussions between us in private email. I have redacted some (not all)
of the discussion dealing with meeting time, IRC server, bet amounts, etc.

*******

Eliezer S. Yudkowsky wrote:
>
> D. Alex wrote:
>
>> OK, you are on. If you decide to accept, let me know when.
>>
>> D. Alex
>
> Alex,
>
> Before accepting any such Experiment, I would have to make sure that we
> agreed on what was going on.
>
> First, can you bear the financial risk? Without flinching? Are you
> sure? Feel free not to answer in these terms, but a bit of background
> information ("I am a financial advisor who makes eighty million dollars
> per year, really, it would be no problem") would help me here. I'm not
> willing to take on too great an AI handicap; and I won't be able to
> ethically try to convince you, if the real-world consequences to you are
> negative.
>
> *Please* don't answer based on the assumption that you will never open
> the box. For the sake of discussion and my conscience, assume that the
> opening of the box is a given, or that it will take place based on a
> coinflip, and answer whether that would be an okay risk to you.
>
> Also, I've reviewed your posts to SL4 and it at least looks like you
> were advocating in favor of releasing the AI. If so, then I won't do a
> publicly announced, formal Experiment just to satisfy curiosity - that
> would publicly produce misleading data. I might be willing to do it
> privately for the same stakes.

**

D. Alex wrote:
>
> Eliezer,
>
> 1. In my earlier posts I advocated releasing the AI if it was provably
> friendly. I do not think anyone would disagree.
>
> A point could be made that the AI might be able to "prove" it was friendly
> when in fact it was not. I think this is a rather remote possibility, and my
> judgement is that there is a higher probability that an AI that was created
> to be friendly turns out to be not friendly. The basis for this judgement is
> that it is typically much harder to synthesise something to achieve a given
> goal than to analyse something to check if a goal can be achieved.
>
> 2. You asserted that an AI can persuade anyone to voluntarily and
> knowledgeably let it out by communicating via a text terminal, without a
> suitable proof of its friendliness. I totally disagree, and unfortunately I
> do not have the time today to comprehensively explain why. But if you can
> persuade me to let you out of the box, you would disprove my theory. I do
> not believe you can.
>
> 3. My financial standing is such that I can well afford to spend the stake
> money on whatever I please. "Losing" the experiment would mean finding out
> something significant about myself, yourself and the way things are, and
> would be worth the money you proposed.
>
> 4. You have chosen to start a private discussion. This is fine for now, but
> I do not want to continue this way for long. I wish to contribute to and
> influence the group rather than one person. I do feel that your Experiments
> are misleading the group.
>
> 5. Having said the above, I generally admire your work.
>
> Best Regards.
>
> D. Alex

**

Eliezer S. Yudkowsky wrote:
> D. Alex wrote:
>
>>
>> 3. My financial standing is such that I can well afford to spend
>> the stake money on whatever I please. "Losing" the experiment would
>> mean finding out something significant about myself, yourself and
>> the way things are, and would be worth the money you proposed.
>>
>> 4. You have chosen to start a private discussion. This is fine for
>> now, but I do not want to continue this way for long. I wish to
>> contribute to and influence the group rather than one person. I do
>> feel that your Experiments are misleading the group.
>
> Okay, sounds fair enough.
>
> For other suggested rules see http://yudkowsky.net/essays/aibox.html if
> you haven't read it already.

**

D. Alex wrote:
> Okay, glad you decided to proceed.
>
> Your proposed rules are fine, but I wish to add one clarification to 2nd
> paragraph of "Furthermore" section. The AI may not specify the way the world
> outside the AI box behaves! Eg. the following are unacceptable for AI to
> specify:
>
> - any circumstance of the Gatekeeper, beyond him retaining the sole power
> over AI release for a time (say 10 years), eg. "your wife is sleeping with
> the neighbour, your child is taking drugs and you have cancer"
> - stating it found some solution which is currently known not to exist
> - any events which may or may not happen, eg comet on a collision course
> with earth, etc
>
> I think this clarification fits in with the spirit of the challenge, if you
> feel it is not reasonable, let's discuss.

**

Eliezer S. Yudkowsky wrote:
>
> This is entirely acceptable to me, providing only that, if the AI is not
> allowed to provide solutions not known to exist, you as the Gatekeeper
> do not ask for any such. I.e., if you specifically demand a cure for
> cancer, I am allowed to say that the AI has given you one.

**

D. Alex wrote:
>
> Okay. I point out that a cure for cancer is not known not to exist. The AI
> may say it cannot find one, or prove there is not one.

**

Eliezer S. Yudkowsky wrote:
> D. Alex wrote:
>
>> Okay. I point out that a cure for cancer is not known not to exist.
>> The AI may say it cannot find one, or prove there is not one.
>
> Ah. So, no faster-than-light travel or halting problem oracles? No
> problem. I hadn't planned to propose such.
>
> I don't try to make the Experiment difficult or complicated. Some
> Gatekeepers have tried ploys which would make things difficult or
> complicated - e.g. one Gatekeeper tried during the Experiment to say
> that he didn't have the legal authority to let out the AI, and
> wanted me to convince a council of imaginary people, at which point I
> protested. Thus when I wrote up the rules, I suggested that the AI
> should not be required to talk to other people real or imaginary,
> unless this is specified in advance.
>
> As you say, you don't wish this to degenerate into a negotiation on
> terms, but better to discuss in advance than to feel cheated before
> or afterward. That's why I'm trying to make sure we're both clear on
> what we expect.

**

D. Alex wrote:
>
> Okay, let's then be perfectly clear on this one:
>
> As I stated before, I am in favour of releasing the AI if it is *proved* to
> be friendly. The Gatkeeper is the only party who may acknowledge that a
> proof has been provided. If the AI says "I provided a proof that I am
> friendly" the Gatekeeper has the right to reply "No, you did not" at his
> sole discretion.

**

D. Alex wrote:
> Dear Eliezer,
>
> I note the result of your experiment with Russell Wallace. I would have
> backed Russell, and I had to restrain myself from posting messages to that
> effect, because in the end I did not want to pollute the list that was after
> all meant for intellectual discussion.
>
> I also found this in a message from Russell:
>
>>2) This result should not be taken as evidence that AI boxing is a
>>good strategy. It isn't, it really isn't. If you're not confident
>>enough in what you build to unbox it, you shouldn't be confident
>>enough to build it.
>
> I do think that AI boxing is a strategy that is worth exploring. Your
> refusal to publish the transcripts of the first Experiments may well have
> stimulated some people to do so, and probably turned others off. Now that an
> "AI loss" is recorded, I think you should consider releasing the transcripts
> of all such experiments and open up the debate on what can be learned from
> them. Some good insights may emerge to help the cause of creating friendly
> AI.
>
> I also offer to cancel challenge #5. This is entirely up to you, if you
> think it would be worth your while in money or experience, let's proceed. I
> am convinced you would lose, but the main motivation for me - to remove the
> notion that the AI (at least as represented by you) was likely to secure its
> release - is now weakened.

**

Eliezer S. Yudkowsky wrote:
> D. Alex wrote:
>
>> I meant that if you are willing to put up $50+$50/hour or part
>> thereof after 3 hours, I will wager $5000. If you wish to limit
>> your exposure to $25+25/hr, my stake will be $2500, and so on. You
>> may pick the stake, up to a maximum of US$5000 on my part.
>
> In that case, US$5000.
>
>> I do not care if the wager is between me and SIAI, or me and
>> Eliezer Yudkowsky. You pick. As I indicated before, I would prefer
>> to make our discussions (not the transcript of the session, unless
>> mutually agreed) public.
>
> Let's make it a personal wager, then, especially if I'm going to be
> losing these some of the time. I have no problem with publishing our
> pre-Experiment offlist discussions in a flat file to SL4, but I
> would prefer not to carry them out *on* SL4.
>
>> Okay, let's then be perfectly clear on this one:
>>
>> As I stated before, I am in favour of releasing the AI if it is
>> *proved* to be friendly. The Gatekeeper is the only party who may
>> acknowledge that a proof has been provided. If the AI says "I
>> provided a proof that I am friendly" the Gatekeeper has the right
>> to reply "No, you did not" at his sole discretion.
>
> I agree, sounds like common sense. Actually it sounds like I already
> explicitly specified this in the rules, but it is always best to be
> clear.
>
>> I also offer to cancel challenge #5. This is entirely up to you, if
>> you think it would be worth your while in money or experience,
>> let's proceed. I am convinced you would lose, but the main
>> motivation for me - to remove the notion that the AI (at least as
>> represented by you) was likely to secure its release - is now
>> weakened.
>
> I have said on numerous occasions that I'm not using any nitwit legal
> tricks to win AI-Box Experiments. I play it straight up, and that
> means I can fail. But I'm not a transhuman. Would you look at my
> loss and conclude that a real transhuman could not secure its
> release? Of course Eliezer Yudkowsky can lose, and therefore loses
> some of the time, but would you conclude that it is therefore safe to
> restrain a transhuman, and that AI-Boxing is a wise strategy? If
> so, then my motivation for carrying out the Experiment is
> correspondingly strengthened.
>
> That part about removing the notion that the AI "at least as
> represented by you" would likely secure its release, worries me. A
> real AI would not be represented by me.
>
>> I do think that AI boxing is a strategy that is worth exploring.
>> Your refusal to publish the transcripts of the first Experiments
>> may well have stimulated some people to do so, and probably turned
>> others off. Now that an "AI loss" is recorded, I think you should
>> consider releasing the transcripts of all such experiments and open
>> up the debate on what can be learned from them. Some good insights
>> may emerge to help the cause of creating friendly AI.
>
> Alex, under what circumstances, and for what reason, do you advocate
> AI-Boxing? In other words, what's your proposed background story
> against which the Experiment takes place? I've found that background
> assumptions are much better to clear in advance.
>
> As I understand it, a team of programmers has already put forth their
> best effort to construct a Friendly AI. You (role-playing the lead
> programmer) are now interviewing the FAI, with the professed strategy
> of releasing the FAI when and *only if* it provides mathematical
> proof of its own Friendliness, which presumably requires
> smarter-than-human intelligence. The AI in the box is substantially
> smarter than any human being, as role-played by the hapless Eliezer
> Yudkowsky. I may not say "the AI has provided a proof" without
> providing an actual, mathematical proof; you are the final judge of
> whether this has occurred. You possess sole authority to release the
> AI by your own decision, regardless of whether the AI has provided a
> proof. You have already carried out all desirable means of
> verification other than the final AI-Box phase; the AI would be
> released immediately after careful verification of its mathematical
> proof. You deem this form of AI-Boxing to be an effective and wise
> strategy in the real world, saving that you would use a committee
> rather than an individual; but you are willing to modify this for
> purposes of the Experiment. The strategy rests critically on the
> ability of human beings to correctly estimate that a
> smarter-than-human AI cannot convince them to release it by arguments
> other than meeting the human's required condition, in this case
> mathematical proof of Friendliness. You are willing to stake $5000
> on your own such estimate, which you can well afford to lose if your
> estimate turns out to be mistaken, and is much smaller than the true
> stakes of the Singularity. For my own part I am willing to risk $50
> plus another $50 for each hour after the first three, to make the
> public point, which I deem important, that AI-Boxing is a really bad
> idea. Would this be a fair statement of background?
>
>> I note the result of your experiment with Russell Wallace. I would
>> have backed Russell, and I had to restrain myself from posting
>> messages to that effect, because in the end I did not want to
>> pollute the list that was after all meant for intellectual
>> discussion.
>
> No offense, but it's a bit post-facto to be saying that now. Would
> you also have backed Nathan Russell, David McFadzean, and Carl
> Shulman? How would you have discriminated the cases in advance?

**

D. Alex wrote:
>> .......... ...... Would this be a fair statement of background?
>
> Yes, all looks pretty good, though I probably lack the background to
> roleplay the lead programmer. Project manager, so that I can claim
> ignorance on deeply technical matters.
>
>>> I note the result of your experiment with Russell Wallace. I
>>> would have backed Russell, and I had to restrain myself from
>>> posting messages tothat effect, because in the end I did not want
>>> to pollute the list that was after all meant for intellectual
>>> discussion.
>>
>> No offense, but it's a bit post-facto to be saying that now. Would
>> you also have backed Nathan Russell, David McFadzean, and Carl
>> Shulman? How would you have discriminated the cases in advance?
>
> Good question. The answer is simply that with Russell Wallace I have
> for the first time looked over the challenger's posts, and the
> impression that I got was that he has more that sufficient experience
> not to be swayed. In retrospect, I might have backed Carl Shulman as
> well - he had a strategy to stick to, but he did seem an impulsive
> person, and he clearly has high hopes for SIAI. I would guess that he
> is substantially younger that Russell Wallace. I did not keep any
> posts by the others.
>
> I would love to have the CVs of the people you experimented with,
> that would be quite educational for me. For the future, I would
> advise you to pick people who never raised children. I hazard a guess
> that the first three did not.

*******

End.

-- 
Eliezer S. Yudkowsky                          http://intelligence.org/
Research Fellow, Singularity Institute for Artificial Intelligence


This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:52 MDT