RE: IQ testing for half-baked AI's

From: Ben Goertzel (ben@webmind.com)
Date: Sat Jul 28 2001 - 15:32:45 MDT


wer of Hanoi problem", etc.
>
> Actually, my thoughts have more focused on milestones than on benchmarks.
> I have a pretty strong feeling that there will be no good way to compare
> benchmarks.
>
> Here are some of the milestones I've been thinking about for the GISAI
> architecture, in no particular order (some of them are early, some of them
> are really advanced).
>
> Milestone: Teaching an AI to play tic-tac-toe using a purely verbal,
> communicational description and no programming whatsoever.

Do you really mean "purely verbal"? I hope you mean "verbal and visual",
as tic-tac-toe is a game very much tied to perception of a 2D physical
world. Explaining and playing purely verbally with no diagrams is possible
but not very natural.

With this emendation, I like this one very much.

We have a similar goal for Webmind: to teach it to play Twenty Questions
purely based on verbal feedback. (Here the "purely verbal" restriction
makes more sense.)

> Milestone: Getting an AI to play humanstyle chess - chess without
> internally representing a search tree larger than humans use - at credible
> amateur levels.

Welll... I don't like this one.

I think one should stay away from milestones that place restrictions on the
internal functioning of the system.

I much prefer milestones that refer only to the system's behavior rather
than to how the system does what it does...

> Milestone: An AI being able to successfully determine when two pieces of
> code do "the same thing" even when they have different graphs. The
> submilestones here are the various levels of "the same thing" - i.e.,
> recursive vs. iterative implementations of the Fibonacchi sequence; the
> different ways of computing Pascal's Triangle; the same piece of code
> written in Java and in Perl; an array container class written in Java and
> a linked-list container class written in C++; assembly language and C++
> source; a high-level verbal description of an algorithm... and so on.

This is an interesting one, and could be broken down into many sub-goals,
some of which are suitable for a baby AI and some of which are very, ery
hard.

> Milestone: As above, but being able to translate on demand between
> substrates. (Quite a different problem - not necessarily a more advanced
> or less advanced version of the above.)
>
> Milestone: Inventing a complex tool, through abstract reasoning rather
> than blind search or even heuristic search, within a complex toy world or
> the world of code. Note that by "abstract reasoning" I mean something
> quite different than the use of blind symbols in classical AI!

I don't mind tool-building as a task, but again, I don't like the
restriction on the system's internal functioning.

I mean, who decides whether a symbol insidee some AI system is "too blind"
to be part of a valid train of "abstract reasoning"? Eli?

> Milestone: Inventing a complex plan to solve a complex solution, again
> through the use of abstract reasoning rather than blind search.
>
> Milestone: Successfully linking events observed in a billiard-ball
> modality image to higher-level symbols, such that a generic event (nine
> balls in three groups of three) can be translated into a symbolic
> structure and reconstructed in rough detail using only that structure.
>
> Milestone: As above, but communicated to a human in English, and then
> reconstructed from the human's description in English. Both this and the
> above also come in gradiated versions depending on the complexity of the
> image.
>
> Milestone: Storing a billiard-ball image in memory and retrieving it from
> memory.

I don't think we're that far off from each other on this topic...

If we

1) accept ONLY those of your milestones that don't refer to internal system
functioning, but only refer to external system behavior

2) quantify performance on each of the milestones (tic-tac-toe is close to
all-or-nothing: did it learn the rules or not? But determining functional
code equivalence, for example, is something where one could make many
different problems of different levels of difficulty and assess an objective
"performance grade" for each of a set of AI systems)

then your milestones are the same as my benchmarks.

I do feel pretty strongly that it's better to have
milestones/benchmarks/whatever that refer only to external behavior. Within
this restriction, one could put Peter's & my tests and your tests together
and we'd have a start toward the kind of system-independent "IQ testing
framework" I was talking about.

ben



This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:37 MDT