Re: [sl4] AI's behaving badly

From: Tim Freeman (
Date: Tue Dec 02 2008 - 19:31:24 MST

From: "Stuart Armstrong" <>
>Two situations: the first one is where the AI is excessively short
>term. Then the disaster is rather clear, as the AI runs through all
>available ressources (and neglects trying to accumulate extra
>ressources) trying to please humans, leading to an
>environmental/social/economic collapse (if the AI is short term, then
>it MUST privilege short term goals over any other considerations;
>using up irreplaceable ressources immediately, in a way that will fill
>the atmosphere with poisonous gas, is something the AI is compelled to
>do, if it results in a better experience for people today).

It's trying to give the humans what they want. If the humans are
long-term, then making a short term decision to give them what they
want stil has good long term consequences because in the short term
the humans want desirable long term consequences. If a short-term AI
is catering to short-term humans, then I agree we might find ourselves
in a short-term universe, which would be an undesirable outcome for
the long-term minority.

If humans in aggregate are short-term and you don't want the AI to be
short-term, then you want to implement an AI that does more of what
you want than what the short-term majority wants. In this case there
will be conflict and it would only be luck that leaves reasonable
people in control of the thing. This problem seems inherent in the

>The second situation is where the AI can think longer term.

Ya, sorry to mislead you there. The paper is stale. Right now I
think the AI should only think about giving humans what they want in
the short term, and leave it to the humans to include long term
concerns in their wants or not. If the AI does long term planning
then there are strange artifacts that arise when the end of the AI's
planning horizon approaches
(, and
other strange things happen if we continuously push the AI's planning
horizon into the future as time passes
I hope that people naturally become more concerned about the long term
as their short term desires are satisfied.

>This is much more dangerous. What is the ideal world for an AI? A
>world where it can maximise human's utilities, without having to
>reduce anyone's. As before, brains-in-an-armoured-jar, drug-regressed
>to six months, boredom removed and with repeated simple stimuli and a
>cocaine high, is the ideal world for this.

This isn't specific to a long-term AI. The AI could do this in the
short term.

This is the deception scenario, described at, or maybe it's
the aggressive neurosurgery test case immediately after that; they're
essentially the same. The fix proposed there is to ensure that it
makes sense to apply the humans' utility function to the AI's
world-model, and to maximize the humans' utility applied to the AI's
world model. This way, the AI perceives no utility gain from
deceiving the humans, since such deception only changes the human's
world model.

>PS: fixing the AI to obeying people's current utilities won't help
>much - it will result in the AI giving us gifts we don't want any
>more, bound only by respect considerations we no longer have. And the
>next generation will be moulded by the AI into the situation above.

The plan proposed in the paper is to reevaluate people's utilities
continuously. Assuming time is measured in the AI's timesteps, the
plan is to take action at time X so that what people get at time X+1
is what they wanted at time X, as much as possible, and then increment
X and continue. Everyone seems to misunderstand what I wrote there,
so it must be poorly explained.

I don't know how to cope with the next generation. The AI should
probably not have compassion for young children directly; instead it
should have compassion for their parents and promote the wellbeing of
the children because the parents want it to. I don't see a way to
deal with a child becoming an adult that does not feel like an
arbitrary decision.

Tim Freeman      

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:01:03 MDT