Re: Friendliness SOLVED!

From: Mark Waser (mwaser@cox.net)
Date: Wed Mar 12 2008 - 18:02:09 MDT


>> Lay off the drugs.

No drugs involved. Just a *very* complicated problem with a surprisingly simple solution that takes some time and effort to convey.

The way to clearly disprove my theory (and the ability to be disproven is the key to any good theory) is to do one of the two following things:
  1.. Show me how I (or an AGI) can stay true to the declaration and still perform a horrible *and* unethical act OR
  2.. Show me a set of circumstances where my Friendliness declaration prevents me (or an AGI) from protecting myself
>> You are just talking about rational choice theory, which neither says much of anything about human action which is frequently irrational, nor does it ensure that an AGI would choose to be friendly.

I am not *just* talking about rational choice theory. I am proposing a formulation that I argue that
  a.. will prevent the an entity who implements from performing any horrible and unethical act AND
  b.. once an entity understands this formulation, that entity will see via rational choice theory that it is in its own self-interest to implement it
>> At the very least you could have tried to prove this with symbolic logic or something. If you had tried, I'm sure you would find the formulas don't add up.

I disagree. Prove me wrong by doing one of the two things above. That should be easy if my theory is as laughably wrong as you believe.

>> Take one of Eliezer's examples of an AGI that loves smiley faces

Trust me. I started with that example. The AGI starts with my Friendly supergoal of "Do not act contrary to someone's/anyone's goals unless absolutely necessary for the fulfillment of a reasonable/rational personal goal (explicitly not including generic sub-goals like money, power,
pleasure, religion, etc."). It then recognizes that filling the universe with smiley faces, as awesome as it is, is also going to be to the detriment of all of it's other goals since *every* other entity is the universe is going to resist rather than assisting it. For a powerful enough, single-goal entity that is sure that it *can* overcome all other entities, this is not going to stop it -- but this is a fantasy edge-case that we should be able to easily avoid. *Any* sufficiently intelligent multi-goal entity (or any entity that realizes that it is not powerful enough to take on *the entire universe*) is going to recognize that this path is probably seriously sub-optimal for it fulfilling the vast majority of it's goals.

>> Acid test over, you lose.

I don't concur. Please try again. The easiest path is to disprove one of the two things above.

And thank you for spending the time to answer.



This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:01:02 MDT