Re: Human-level CPU power crossover date

From: James Rogers (jamesr@best.com)
Date: Thu Apr 12 2001 - 14:26:13 MDT


At 01:40 PM 4/12/2001 -0500, Jimmy Wales wrote:

>Do you think it is possible to give a similar analysis of memory
>bandwidth? In
>other words, you speak about memory bandwidth in silicon growing exponentially
>but more slowly than speed. And you speak about silicon memory bandwidth
>still
>being dramatically less than brain memory bandwidth. Can we quantify this
>in some
>way? At least, roughly?
>
>How should we measure memory bandwidth? How much bandwidth does the brain
>have?
>How much bandwidth do current supercomputers have? How fast is that growing?

The rough figure for memory bandwidth growth is that it is growing at 10%
the rate of processor speed, and has been following this rate of growth
since at least the late '80s. Memory latency has actually gotten much
*worse*. Increasingly complex caching schemes have mitigated the effect of
slow memory to a certain extent, but the benefits of this are diminishing
as well. In low-level hardware circles, the point at which a processor is
completely bound by the memory architecture is referred to as the "memory
wall". Chip makers have been dodging this bullet for some time with clever
cache optimizations that work for *most* applications; while processors are
severely memory bound, increases in core clock speed do produce shrinking
increases in total computing performance. However for applications that
have large codes and/or large data sets, you can easily hit the memory wall
because the caching becomes ineffective. It is interesting to note that
this is one of the reasons the old 200MHz Pentium Pros still look very good
for high-end computing; the well-matched (and expensive!) memory system on
those chips give remarkably good real-world performance for classes of
problems that are memory bandwidth bound.

There are standard benchmarks for measuring memory bandwidth of systems
(mem + proc + cache + I/O), the most commonly used one being STREAM.

A bunch of citations and references on the subject (from Computer
Architecture News) can be found at:

http://citeseer.nj.nec.com/context/100863/372270

The human brain isn't really comparable to silicon in terms of memory
bandwidth. The human brain has billions of slow low-latency data
channels. Silicon has a small number of fast high-latency data
channels. The real killer in all this is the latency. I think one could
easily claim that a small number of fast channels are equivalent to a large
number of slow channels as long as the aggregate data rates are the
same. Latency, on the other hand, can cause dramatic drops in real-world
memory performance, especially for computation that causes frequent cache
misses. Note that latency is also why massively parallel systems don't
work for problems that require access to large quantities of shared data;
you lose as much real computation time on latency as you gain by having
additional processors (in certain cases your application may actually run
*slower* on parallel systems). As a general rule, shared memory
applications that show poor cache efficiency on a single processor will
scale poorly on massively parallel systems.

>Questions like: how much of the brain is actually used
>for intelligence, as opposed to being used to run our automatic biological
>functions?

The line is fuzzy here (and a little outside my core competence), but I
would guess that a small fraction of the brain hardware would actually be
required for raw intelligence. There have been many people who function
quite well with large sections of their brains missing.

>To what extent can the speed advantage of silicon _make up for_ a lack of
>memory
>bandwidth?

Speed and memory bandwidth are apples and oranges; you cannot replace one
with the other and both are rate limiters for the entire system (CPUs don't
exist in a vacuum). Right now, processor architectures are severely rate
limited by bandwidth.

Think of it this way: If you have a simple processor that can execute 10
operations per second, but is fed those operations at a rate of 3
operations per second, what will be the effective execution speed of the
system? The answer is obviously 3 operations per second. Now suppose they
scaled the processor to 30 operations/s and increased the feed rate to 4
operations/s (roughly the situation we have today). It doesn't really
matter how much processor you have if you can't keep the pipeline full. To
make matters worse, for some classes of problems, it is not possible to
hack the design (e.g. caching schemes) to give faster effective feed
rates. Therefore, the speed advantage of silicon is actually irrelevant in
the absence of sufficient memory bandwidth, as bandwidth is a rate limiter
of the total system performance.

-James Rogers
  jamesr@best.com



This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:36 MDT