Maybe a robust quality function can only be kept consistently through rampant
self-improvement by means of quantum gravitational computation, in other words
perhaps the logic/math associated with an ideal quality function are incumbent
on non-turing computable numbers. If this were true it would suggest that not
only would the utility function help effect a universe whose value is
corroborated by the quality function, but the utility function may actually
serve to instantiate a robust quality function. For example say the utility
function were to figure out how to increase computation resources from the
original substrate to a level where the Penrose Hypothesis could be reasonably
tested and a quantum gravity computation substrate could be instantiated to
process the nessecary qualia a quality function would require. The causal
validity semantics would keep this transition from killing us in the process,
I imagine in my ignorance that a mind using less than a square meter of
computronium should be able to figure out the required basis for computing a
quality function, particularily figuring the degree to which the Penrose
Hypothesis is true. Its quite reasonable to assume the possibility Mr. Penrose
was just being anthropomorphic and his hypothesis is totally false, in which
case the FAI should default to inductively creating a quality function based
on Friendliness content reasoned over with good ol' fashioned
turing-compatable processes. Any utility function aiming for this Holy Grail
should have a shadow utility function which has proven consistency of
Friendliness provided the Holy Grail doesn't exist.

>> Should the AI be a good predictor, it will
>> systematically steer reality into
>> regions which its utility function assigns high
>> utilities. Thus, the term
>> "supergoal" that I used in CFAI means simply
>> "utility function". And if the
>> AI is a good predictor, its utility function
>> also serves as a good description
>> of the target of the optimization process, the
>> regions of reality into which
>> the AI does in fact steer the future.
>I think this is an appropriate time to apply my
>previous post about the NLP method of identifying
>'values' in the sense of 'what does this utility
>function achieve?' Perhaps for sl4 purposes a new
>word is needed, since mathematicians already mean
>something else by the word 'values', and it's
>apparently too late to decide that 'supergoal'
>can fill this function.
>So let me propose "quality function" as the new
>term; it implies 'qualia' since (frothy, dodgy
>word though it is) we can agree that pleasant
>qualia are what we humans are hoping to get from
>FAI; it also implies 'quality' in the sense of
>'quality of life' which, again, we are hoping to
>get from FAI.
>I propose this since you (Eliezer, and perhaps
>others) seem to have punted the question of "What
>is the optimum utility function *in humans* that
>the FAI utility function is supposed to achieve?"
>It seems to me that you have found flaws with all
>pre-existing ideas about what this ultimate human
>goal or set of goals could be, which presents a
>seemingly intractable Alphonse-and-Gaston dilemma
>(You first! No, you first! No, I insist, sir! And
>so on.) The seed AI programmer dare not make any
>assumptions beforehand about CV without first
>letting the superior mind of the AI judge the
>likelihood of success, but if the AI gets ahead
>of the human programmer, ve may not be Friendly
>or fully informed about the needs of humans, and
>offer wrong answers, and around and around we go.
>So, like it or not, we need more confidence that
>we understand the quality function, from which we
>derive the AI's utility function.
>I'm thinking of Monty Python and the Holy Grail.
>King Arthur, at the end, are convinced that the
>French knights in the castle have the Grail, and
>one of their failed gambits is the building of a
>wooden rabbit, which is catapulted back and
>crushes one or two Englishmen. A nested view of
>their utility functions (UF), with each step
>intended to achieve the next, higher value, might
>run as follows:
>UF Build wooden rabbit => UF infiltrate castle =>
>UF overpower French knights => UF search castle
>=> UF find Grail => QF serve Greater Glory of
>Fans of the film will know that the plan broke
>down at the second step, due to the failure to
>put anybody in the wooden rabbit (obviously, any
>utility function might depend on more than one
>lower function, so the actual nested view could
>be a tree converging on a single trunk function;
>when you are down to one trunk or a set of equal
>and indispensable trunks, you've found your QF).
>But there were many assumptions underlying the
>English plan; once inside, they might have lost
>the fight; the French might not really have had
>the Grail; Jehovah's original geas to go find the
>Grail might have been a mass hallucination!
>Perhaps, in order to serve the highest quality
>function (Greater Glory of God), Arthur and his
>knights should have just gone home and helped
>feed the poor.
>This is a point where I think I misunderstood
>Eliezer all along; when he wrote 'supergoal (UF)'
>I thought he was talking about something a lot
>closer to 'supergoal (QF)'.
>Now I expect Eliezer to fill my backside with
>buckshot for proposing a quality function that
>floats atop the utility function with no
>mathematical means of support, but it can't be
>helped (yet?) The quality function, in the end,
>would be what all discussion of CV, domain
>protection, singularity fun theory, and so on,
>are intended to produce: a
>mathematically/logically sturdy formulation of
>what it is that the utility function is meant *by
>us* to achieve. A paperclip maximizer, in this
>light, is an AI that performs the utility
>function flawlessly but makes a total wreck of
>the quality function (or punishes us for not
>having nailed down the quality function in time
>for the AI to make use of it).
