RE: Ben what are your views and concerns

From: Ben Goertzel (
Date: Wed Oct 04 2000 - 05:55:51 MDT


> The primary design requirement for
> interim Friendly
> AIs is that the AI will let you build the final system - in other
> words, an AI
> that understands that its own goal system is incomplete and that
> won't resist
> additional work on it.

This is an interesting question.

Once Webmind starts rebuilding itself, how do we guarantee it will let us

Physically speaking, it has no way to stop us from looking at its source
code. It
has no access to real-world objects like robotic warriors that could stop us
doing so, in the near term.

In practice though, it could refuse to tell us how its source code worked
(assuming it
had modified it significantly). In this case, we'd have practically lost
the ability
to modify its goal system.

Do you have some concrete idea as to how to set things up so that, once a
system starts
revising its own source, it remains friendly in the sense of not
psychologically resisting
human interference with its code.

The only strategy I can think of at the moment is: Make certain parts of the
code immutable,
i.e. the parts that tell it to listen to humans. But yet I'm suspecting
that adequate modifications
in other parts of the codebase would lead to a workaround of any such
immutability. Thus I suspect
that ultimately this isn't an answer, though it might postpone the day when
the Ai has a potential
to get surly.

> The prospect of blissed-out AIs is not
> theoretical;

yeah, we can achieve this in Webmind vey easily

> A hardcoded goal lacks
> context. It lacks
> reasons, justifications, and complexity.

Humans are born with hardcoded goals in their brains, which have to do with
certain chemical levels in the brain...

We improvise upon these in adult life, creating complex and fabulous goal
but with the hardwired goals as a basis

This is why basic phenomena of status and sexuality, for example, loom so
large in
our lives -- even the lives of us computer-weenie Ai supergeniuses ;>

> Your own personal philosophy is
> not necessarily
> stable under changes of cognitive architecture or drastic power
> imbalances.

And nor will an AI's be, necessarily, will it?

> If the AI derives its happiness from the happiness of humans -
> which could be
> a rather dangerous goal, depending on how you define "happiness";
> let's say it
> derives happiness from being Friendly - then it's not enough to have that
> piece of code present in the current system; the self-modifying
> AI also needs
> to decide to preserve that behavior through the next change of cognitive
> architecture.

Won't this only be achievable if emotionally positive interactions with
are ongoing continuously, during the periods of self-induced architectural

> Once you decide that the AI needs a declarative supergoal for
> promoting the
> happiness of others - or however you define Friendliness - one
> must then ask
> whether an instinct-based system is even necessary. I wasn't planning on
> designing one in.

I'm not sure what the distinction is between an instinct-based and
goal systems, really

In Webmind, we have GoalNodes, some of which are supplied at startup
but some of these may be expressed in terms of logical propositions
initially, whereas
others may be expressed in a form that the system can't currently reason on
(but could
reason on if given a "codic sense" to map the state transition graph
underlying its Java
code into its inferential nodes and links).

There are FeelingNodes, for instance the Happiness FeelingNode. One
in-built goal causes the system
to want to maximize is own happiness ... another goal causes it to want its
own survival (not to run out of memory, not to let its queues get too full,
etc.). User happiness is wired in too, by a "compassion" function that
causes perceived happiness of others to increase system happiness..

> ** Webmind
> The problem is that - as I currently understand the Webmind
> system - Webmind
> is not a humanlike unified mind but rather an agent ecology.
> Webmind does not
> possess a declarative goal system - right, Ben? I certainly get the
> impression that the individual agents don't possess declarative
> goal systems.

The individual nodes in Webmind do not, but there are GoalNodes that
direct overall system behavior to some extent.

I'm currently working on a public-domain version of our internal Webmind
Overview document... I should be done with it by the end of the week, which
means it may be approved for release by our lawyer by the end of the
week at lastest...

The current publicly available literature tells very little about the system

Generally, webmind does have much more of an overall control structure than
an "agent
ecology" -- but,t he control structure "drives" or "guides" the underlying
agent system
rather than having its commands inexorably propagated as in, e.g., a
type architecture
> Individual agents extract features, either from the raw data or
> from features
> extracted by other agents; agents make predictions for different
> scenarios,
> and other agents act on multiple predictions so as to mark the
> scenario with
> the best predicted outcomes according to multiple agents. Webmind, at its
> current stage, engages in acts of perception rather than design -
> right, Ben?

No, both

> Webmind achieves, not coherent and improving behavior, but coherent and
> improving vision.

No, there are SchemaNodes which carry out actions, and the SchemaNodes that
to better behaviors are rewarded. Better schemanodes are learned via
and inference. Schema for behaving may be distributed across many
or encapsulated in one for greater efficiency...

> I'm not sure whether Webmind currently possesses any sort of Friendliness
> system at all, but if it did, I imagine it would be implemented by having
> agents that attempt to perceive happiness on the part of
> users/humans, predict
> happiness on the part of users,
> and choose that action which is
> perceived to
> have the greatest chance of making maximally happy users. Once the link
> between prediction and action is closed, there is no sharp
> distinction between
> perception and design.

The link between perception and action is closed, in the current system...

The Friendlienss system as you call it is implemented in the Happiness
but a system that could rewrite its own code would modify this... and if
it could always create another FeelingNode, the Happiness_1_FeelingNode, and
not to build any more links to the original immutable but

> After
> the AI system
> has been rewriting itself for a while - which could be measured
> in years, or
> days - there comes a point where it can enhance itself
> independently of the
> human programmers. At this point there's an entirely new set of
> rules. The
> AI can redesign itself radically in accelerated subjective time
> and walk out
> as a transhuman, not just more intelligent, but actually *smarter* than
> humans.

We partly agree, then.

I'm just suspecting it will be years rather than days. Probably 2-5

here is how we differ though. It seems to me that, even after transhuman AI
shows up,
this won't make it superior to humans
in all respects, or necessarily give it physical power over the world we
live in.

It may improve its own intelligence in other directions, directions that we
really comprehend yet

obviously, I'm a big fan of Stanislaw Lem ;>

> For Webmind to wake up as a transhuman, at least two major
> changes would need
> to take place. First, Webmind would need to be capable of initiating
> arbitrary actions within itself, particularly with respect to
> self-redesign.
> Second, Webmind would need a complete, goal-oriented
> self-concept, so that it
> has a metric for "better" and "worse" self-redesigns.
> I'm not sure that either capability is being deliberately
> designed into the
> current system,

A goal-oriented self-concept is in the current system. However, you can't
really stop
it from redesigning its self-concept, once you've allowed it to redesign
You can mitigate against it doing so somewhat, but this might possibly slow
down the
increased intelligence obtained through the redesign process...

The GoalNode and HappinessNode, etc., are just to be viewed as instinctual
seeds about
which the system's actual goals and happiness crystallize via emergent agent

The system can initiate many kinds of actions, but at the present time, not
involving self-redesign. This is planned for Webmind 2.0, which will be
released sometime
in 2002. In Webmind, self-redesign is a pretty advanced process, involving
the system mapping
the state transition graph underlying its Java code into its inferential
nodes and links, and
the system has a lot of simpler tasks to master before it can handle this.
A simpler system could
achieve self-modification at an earlier stage than this, but, I suspect it
would lack the intelligence
to self-modify itself intelligently and so would never get onto the
exponential growth curve that
Eliezer envisions.

> It looks to me like Webmind, if it woke up, would probably wake up as
> unFriendly.

What I think is that continuous emotionally-rewarding interactions with
humans during
the period 2002-2005 when Webmind is learning how to improve itself by
rewriting its sourcecode,
will induce the Friendliness you desire

> ** Possible fixes
> * Knowledge about design goals
> Webmind needs the knowledge that the pleasure system is a design
> subgoal of
> Friendliness rather than the other way around.

Currently, it is "the other way around" !!!

> Sure, you can get 90% of the commercial functionality with a
> shortsighted goal
> system - but just wait until the first time Webmind, Inc. gets
> sued because
> one of your Personnel AIs turned out to be using the "Race" field to make
> hiring recommendations.

hey, our Webmind Text Classification System ~already~ does that. But the UI
doesn't let you look under the hood to see what fields it's using ;>

> * Flight recorder
> I don't know if this would be practical for Webmind, or how much it would
> cost, but it does strike me as a system that would have uses
> besides Friendly
> AI.

it's a lot of cost... Webmind caches its mind-state periodically, but not
complete experience-state... we've certainly considered it. But the cost of
~using~ this data would be very high... more so than the cost of collecting
Maybe it's worth collecting in case a transhuman mind eventually figures out
an effective
way to use it!

> * Commerce and complexity.
> The complexity of a full-featured Friendly goal system may be
> impractical for
> most commercial systems. However, if Webmind, Inc. starts getting into
> self-modifying AI past a certain point, you will probably find it
> commercially
> necessary to split the mind. The Queen AI is proprietary and not
> for sale; it
> runs at Webmind Central on huge quantities of hardware and knows how to
> redesign itself. The commercially saleable AIs are produced by
> the Queen AI,
> or with the assistance of the Queen AI, and contain the ready-to-think
> knowledge and adaptable skills produced by the Queen AI, but not
> the secret
> and proprietary AI-production and creative-learning systems
> contained within
> the Queen AI. If you set out to sell commercial AIs containing
> everything you
> know, you may find that you can only sell *one* AI.
> The Queen AI is the one that needs the full-featured Friendliness system.

yeah, we've thought about this too... it's still uncertain what the model
be once that level of intelligence has been achieved... intuitively though I
toward a more distributed model

> * When to implement changes
> At present, the probability that Webmind will do a hard takeoff is pretty
> small -

naturally I disagree, but I'd think the same thing in your position ;>

we can revisit this in a week or 2 once I've released the document
explaining more about
the system

>although if there's any way for Webmind to build and execute
> Turing-complete structures,

That possibility does exist, it can create SchemaNodes inside itself, using
internal psynese programming language, which
are Turing-complete



This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:35 MDT