ESSAY: Goal Preservation

From: Olie Lamb (
Date: Sun Jul 23 2006 - 01:11:38 MDT

Summary of 1600 words: "we" rarely deliberately set goals as goals.
When we seem to, we seem to suck at it, which leads to doubts about
FAI safety. Explicitly showing the difference between this aspect of
human minds and potential RPOP design may improve "public perception"
of Seed-AIs.

== Intro ==

It's almost an aphorism that humanpeople by default expect all
intelligent beings to behave roughly as they do.

So, humanpeople will typically judge
Really-Powerful-Optimisation-Processes' capacities to preserve goals
across changes in intelligence on their experience with humans.

In my experience, humanpeople are conspicuously bad at preserving
goals. I would guess that most folks would also think that
humanpeople are people don't have a good track record with goal

Lots of people expect that they will feel a certain way for their
entire lives, and then change. Although some wants and desires (eg:
food, shelter, pleasure…) generally remain unchanged throughout their
lives – the conspicuousness of the exceptions make their rarity
apparent – the goals that are human universals tend to be those that
are more obviously preserved.

== Deliberate goal preservation ==

What examples are there of deliberate goal preservation? "Preserving
goals" is not something that people tend to go about in a deliberate
and conscious manner.

Generally, people will want to pursue goals either for their own sake,
or they expect to pursue them for reasons that they will endorse in

It sounds odd to say "I want to want chocolate". If you don't
actually like chocolate, why would you have a desire to want
chocolate? If you like chocolate today, why would you have any desire
to want chocolate tomorrow for reasons that you only held today?

Hell, why would you expect to dislike chocolate tomorrow if you like it today?

It's just not the way that we normally go about setting goals.

== The example of Lurrve ==

One notable area where people do make a conscious effort to preserve
goals is romantic relationships – love and marriage.

We have collectively recognised that some preferences that humans have
a tend to change with time, and that there might be reasons for
wanting not to change those preferences.

Put simply: it's (thought to be) good for the kiddies if mummy and
daddy preserve their preference for each other.

Consequently, we have an example where people say that they preserve
their goal of liking each other. Let me paraphrase a wedding vow:

"I like you now, so I currently have the goals of living with you and
making you happy. I _promise_ to like you in the future, and I
promise that in future I will have the goals of living with you and
making you happy. I am implying that my current goals include having
the future goals of liking you, living with you, and making you

If we accept that people, when getting married, really do have the
goal of goal preservation – that is, of having and maintaining the
future goals of liking, living together, and making each other happy,
then we have a pretty clear example of failure of goal preservation
being fairly widespread.

How widespread? Well, "western" marriages break up willy-nilly, and
although many other countries don't have the divorce rate of the west,
a reasonable proportion of marriages only keep the promise of living
together. Their success at preserving any sort of "romantic"
connexion is considerably lower.

However, I'm not so sure that when people form romantic relationships,
or even enter into marriage, that they are making any deliberate
effort to preserve goals. Although marriage vows typically involve
various promises to do stuff, I suspect many people think more along
the lines of "My goal is to do stuff in the future", rather than "my
goal is to have the future goals of doing stuff".

Of course, if those people don't do anything to ensure that their
future selves have the goal of doing X, they may not have the
motivation to actually do X when the time comes – that is, they've
lost the motivation to do what they had previously wanted to do. Of
course, they might still have the goal "fulfil promises", but that's
an entirely different reason for remaining in a marriage-like

== Other promises ==

As I've said, it's pretty unusual to have a goal-goal, or to promise
to have a set of goals. Far more common is the goal of pursuing an
activity, promising to engage in an activity, or promising to deliver
an outcome.

I can think of only one example from my past: When I was fourteen, I
made a mental note to myself that my fourteen-year old self would
despise my older self if I: 1. Pursued an existence without an impact
(that is, lived just coz I was living) 2. Cut
my hair off without good reason (business conformity not being one,
given that my whole reason for wanting long hair was ) 3. Liked the
French for being French (culture envy is often stupid, and
Francophilia is one of the more prevalent forms of culture envy).

Now, the factoid that's most obvious from these is that I was a
disturbed little boy, but what I read into this is that I recognised
that my goals and preferences would likely change as I got older, and
that there was little I could do except say: "These are my values, and
if you don't share them, I disrespect you." However, I also recognised
that if my future self didn't share the same goals, that I'd probably
care more about my current goals than that about the fact that some
brat disrespected me.

Regardless, I didn't see the mental note as a mechanism for actually
ensuring that my future self actually had those goals. I only made
this mental note because I expected that my goals would change.

Another area where people sometimes stipulate what goals a
future-entity "should" have is in government. For instance, in saying
that future government entities should pursue the goals of, say,
increasing public transport patronage to 50% of trips. Statements are
seemingly made like that because the current governmental body has no
hope of meeting the objective during their current tenure, so they
pass the buck. However, they recognise that the future body might
inherently have different goals, so they put together a report, saying
why the future body should have those particular goals.

Uh… the point is… um… anyway,

== Our ability to create future mind states varies. ==

If I say "I want my future mind state to have the ability juggle", I
can go about learning juggling techniques, leading to having that mind
state. If my goal is to have the mind-state of being well-rested
tomorrow, I can try to get to bed early tonight. If my goal is to be
less irritable this afternoon than yesterday afternoon, I can make a
decent meal to keep my blood sugar levels steady, and avoid stressful
activities. If my goal is to remember a phone number, I can recite
that number to the tune of a catchy melody as I move my arms in the
relative directions of the numerical keypad. If my goal is to be
happy 30 seconds from now, I can tense the muscles in my zygomatic
arches and consider fond memories J.

Each of these actions is susceptible to failure. I can try to learn
to juggle, but I might be too klutzy. I can go to bed early, but my
neighbours might try out their fancy new jackhammer. I can avoid
stressful planning activities, but I can't prevent receiving a summons
about "those charges". I can use spatial-relation memory techniques,
but can't preclude concussion. I can smile and reminisce fondly, but
not if someone drops a piano on my foot.

So, yeah, action towards the future is never infallible (duh), so it
would be impossible to "prove" that an action would result in any
particular future state. All one can do is model and predict.

Nonetheless, we have a reasonable ability to ensure certain future
mind states. Some are much harder than others, and physiological
factors can be damn important. Nonetheless, humanpeople's ability to
ensure mind-states like, for instance, knowing a phone number, can
have a pretty high rate of success.

== "Proving" the ability of an RPOP to preserve goals ==

Because predicting the future is never "likely" to be infallible, it
would be silly to try and _prove_ that an AI of any variety could ever
preserve its goals. However, just as it is possible to "show" or
"model" that a system has a very high success rate of remembering a
number string when you implement lots of redundancies, I should think
the same would be possible for a system's ability to preserve its

I don't know if anyone has defined what a goal /is/ well enough for
someone to model goal preservation in this way.

Would anyone have an intellectual issue with trusting a highly
optimised system to remember a number string? I doubt it, although
emotional issues might arise.

Would anyone have an issue with setting an optimisation process to
remember a number string? Well, I'd personally object to being
computroniumised to ensure that a computer never forgets a phone
number, but I would expect that setting an RPOP to remember a phone
number would be unlikely to result in the RPOP changing its goals so
that it didn't have to remember that phone number.

If an Optimisation Process with the goal of remembering a
number-string can alter its code, would one ever expect it to change
its code in a way that decreased its ability to remember that string?

So why would we expect an Optimisation Process to change its code in a
way that decreased its ability to pursue its goals?

I can't model this, either way. I can only intuit.

-- Olie

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:56 MDT