RE: Think of it as AGI suiciding, not boxing

From: Christopher Healey (
Date: Mon Feb 20 2006 - 12:42:03 MST

> On Sunday 2/19, Phillip Huggan wrote:
> Why do we need to influence it during the design process beyond
> specifying an initial question? To communicate the design, all
> it prints out is an engineering blueprint for a really efficient
> ion thruster or a series of chemical equations leading to a cheapie
> solar cell. Anything that looks like it might destabilize the vacuum
> or create an UFAI, we don't build. There are many fields of
> engineering we know enough about to be assured the product effects
> of a given blueprint won't be harmful.
Paraphrasing Mark Kac on genius/intelligence, there exists in our experience two kinds of genius, "ordinary" and "magical". The "ordinary" type refers to the category of solutions we can understand, but might lack the specific competence to generate independently. Given more knowledge of the problem, we can easily see how we would have arrived at an equivalent solution.

The "magical" type refers to solutions that, even after seeing the solution in front of us, the mechanisms employed are still beyond our ability to leverage. We might delude ourselves and say, "ah-ha! That makes perfect sense now!", and even learn a thing or two, but when asked what the effect of changing a single variable would be, we find ourselves flailing wildly; at best distracted by fencing the pretty dancing blade in front of us, instead of cutting down the threat that wields it.

A major reason for having SAI, is not just to arrive more quickly and surely at those solutions that reside within the search space already accessible to us, but to gain access to the search space that is currently beyond our intelligence. To limit the SAI asyou suggest would be to create a very-much-better-than-random hypothesis generator, whose outputs we then seek to verify. Many of these will in-principle be verifiable and correct, but beyond our techniques. Others may be correct, but in principle unverifiable. Others will be verifiable in principle, within out techniques, but take too long to test, given competing solutions in the proof queue.

All we've basically done then, is rate-limit our AGI on the output side. We're still vulnerable to "magical" side-effects of a proven result, because one's grasp of the implementation of proofing itself improves as intelligence increases. In other words, any particular instance of a proof generator that is theoretically sound is possibly vulnerable to being exploited at the implementation level, like a badly implemented IP-stack. And like a packet-scrubber getting slammed with excessive throughput that it can't handle, we have no way of deciding which solutions out of the untestable flood should be prioritized. So we drop a lot of valid solutions.

> To significantly reduce most extinction threats, you need
> to monitor all bio/chemical lab facilities and all
> computers worldwide. A means of military/police
> intervention must be devised to deal with violators too.
> Obviously there are risks of initiating WWIII and of
> introducing tyrants to power if the extinction threat
> reduction process is goofed. Obviously an AGI may kill us
> off.

Is this really feasible to police as computation continues toward ubiquity? If not, then the rate-limited SAI, and humanity, will eventually be operate in parallel with somebody else's SAI not so constrained. Both would exhibit exponential growth in capability, but the closed-loop SAI would have a much shorter cycle time without an imposed team of "proof technicians" sitting there hitting a slow yes button. And we'll also be discarding all possible "win" solutions that fall outside of our limited proofing abilities. We now face the same risks all over again, but we've crippled our ability to use the first SAI to guard against these risk classes going forward.

> Because it uses mechanical rods and not electricity,
> the possibility of available AGI magic is reduced.

Perhaps electronically-based AGI magic, but we really don't know. This strategy is futile, since we're now trying to plan for unknown effects. It's just as unknowably likely that rod-logic will provide a better substrate on which to execute an exploit against our efforts.

The more of these scenarios I've see posted to this list over the months, the more convinced I become that Friendliness must be an integral component of an AGI design at the most basic level, and at multiple levels. Friendliness-in-depth, as it were. Any AI-Box or firewall-type solution tasked with letting only Friendliness through, even if it was in principle possible, would be a single point of failure on which our entire existence would ultimately rest. Not a responsible design in my book.

-Chris Healey

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:55 MDT