From: Dale Johnstone (DaleJohnstone@email.com)
Date: Thu Jan 25 2001 - 10:19:15 MST
<snip simple Programmer - AI dialog>
Elizer wrote:
> Well, Programmer#2 is obviously in the right, here - the above looks like
> a really basic mistake, not the sort of thing that can be (reliably) fixed
> by pouring in some more computational power.
Yeah, of course, just a little light entertainment to illustrate the basic
case.
> I won't tell you "not to worry", but I will venture a guess that,
> pragmatically speaking, the above scenario will never show up in real
> life. Wireheading errors appear so easy for unintelligent
> self-modification (*cough* Eurisko) that any AI that's gotten to this
> stage almost certainly has some model of verself in which wireheading is
> "bad"...
Yes I'd agree in general, but I don't see any reason not to keep looking for
problems. I can't chase you further on this point because it depends on your
implementation specifics.
Human drug addicts are more intelligent than Eurisko, and they know it's
bad, but that doesn't stop them. Intelligence and forewarning are no
guarantee. I will admit though that human design is probably largely to
blame.
> of course, the question is whether it's "bad" for deep reasons or
> whether it's just a generalized description that some programmer slapped
> an "undesirable" label on.
Well, what's bad to just a collection of quarks? There's the unresolved
issue of, dare I say it, qualia <shudder>. I hope it resolves itself as a
non-issue just as 'life-force' did in biology, but you have to admit it's a
tough one.
It could be that qualia appears & does something unpredicted to the goal
system. As we don't fully understand how qualia manifests itself, it'd be
wise to keep checking. I suspect a major redesign would be needed when/if
qualia comes into play.
Since qualia tends to appear only in systems sufficiently complex and
self-referential, there may be higher qualia class 'things' that only appear
at higher levels of complexity. It's probably not worth me telling you to
look out for those, but I'll say it anyway.
> Under the _Friendly AI_ semantics, the goal system's description is itself
> a design goal. A redesign under which the goal system "always returns
> true" may match the "speed" subgoal, but not the "accuracy" subgoal, or
> the "Friendly decisions" parent goal. The decision to redesign or
> not-redesign would have to be made by the current system.
The 'always returns true' case was just a joke, it wouldn't fool anyone on
this list for a second. It's useful as a common point of reference to ground
the conversation.
How do you measure the 'accuracy' of the goal system's description?
How is the description expressed?
What's to stop goal drift over sucessive redesign iterations?
What if you get to a situation whereby an improvement in design is only
possible by a slight reduction in accuracy compliance? Have you allowed for
stopping to be a valid solution?
Cheers,
Dale.
This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:35 MDT