Re: Hardware Progess: $165/Gflop

From: J. Andrew Rogers (
Date: Sat Jan 03 2004 - 10:55:06 MST

On 1/2/04 8:30 PM, "Marcos Guillen" <> wrote:
> - Memory throughput is vital, you'll do much better with less memory per
> node and a faster processors with a faster bus, preferable capable of 400
> Mhz DDRAM.

While I share a strong preference for the fastest memory possible, having
smaller quantities of faster memory is an untenable trade-off for many
applications. I would actually trade CPU performance for more and faster

> - Giga Ethernet card is not that expensive, and is very handy while working
> with fast pathways for visual and auditory cortex related nodes.

The latency is still problematic though. Few applications find FastE to
have inadequate bandwidth that also do not find Ethernet (GigE or otherwise)
to have inadequate latency performance.

> Finally, the "Gflops" figure is tricky. You are not going to be dealing with
> 64bit operations, but mostly with 8bit operations, so good old
> Linpack test becomes almost meaningless.

Worth repeating. Linpack is an extremely narrow benchmark that basically
measures a CPUs ability to execute DSP-like operations. If your application
is not bound by DSP-like instruction throughput, Linpack comparisons mean
almost nothing.

A stark example of this is the recent Virginia Tech PPC970 cluster being
ranked as #3 on the new Top500 list. For most applications, the AMD64
Opteron systems soundly outperform the G5/PPC970 clock for clock, yet on the
Top500 Linpack benchmark it looks like the PPC970 soundly thrashes the
Opteron. Why? Because the PPC970 has native DSP-like instructions in its
ISA, and if you benchmark its performance for DSP-like code it will use this
to great advantage. For almost anything that does not look like DSP code
(e.g. lots of mult-add matrix operations), the PPC970 is a fair-to-middling
performer clock for clock.

In short, the best benchmark is your own code. Different architectures have
different strengths, and depending on what you are doing this can make a big

To sum up mass market CPUs:

Intel: Very strong integer performance
AMD: Best memory and GP floating point
PPC: Best DSP performance

J. Andrew Rogers (

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:43 MDT