Re: ARTICLE: Memory bandwidth

From: James Higgins (
Date: Sun Apr 15 2001 - 11:39:38 MDT

At 09:06 PM 4/13/2001 -0700, James Rogers wrote:
>On 4/13/01 6:15 PM, "Brian Atkins" <> wrote:
>You are somewhat missing the point. The comparisons are in fact typically
>between a 800-MHz P3 and 1.5-GHz P4, but the difference in clock speed is
>immaterial if you actually look at the benchmarks (mostly because the
>benchmarks show *why* clock speed is immaterial). In fact, the article
>squares perfectly with what I've been hearing on the hardware lists where
>people have been running their own benchmarks.
>In short:
>For fp and vector codes *where the data fits in the cache*, you generally
>get much better performance with the P4 than the P3 i.e. Better than
>suggested by the clock speed difference. Since it fits in the cache, memory
>bandwidth and latency is mostly irrelevant.
>For fp and vector codes that have data sets substantially larger than the
>cache, the speed of the processor is irrelevant: a P3 will be memory limited
>and the P4 will be memory limited. The P4 has a poor memory architecture
>and benchmarks as a dog compared to similarly clocked Athlon chips, due
>solely to differences in memory architecture; the Athlon has more bandwidth,
>so for memory limited problems it is faster, even at slower clock speeds.
>Note that the P4 has 15-20% *worse* memory latency than the P3, so cache
>efficiency is even more important on the P4 than on the P3. What the P4
>represents is one of the first times where the inter-generational
>differences in processor performance are based almost solely on the memory
>bus chipsets, not on the processor clock.

I got the exact opposite conclusion from the article. When discussing the
SWIM benchmark the P4 outperformed the P3 by 2.5x. The article states this
is clearly due to the P4s 2.5x higher memory bandwidth. Thus the P4 does
much better on tasks which are constrained by memory bandwidth. It goes on
to say that such applications are uncommon (thus this result should not
weigh as heavily). However, AI software would most likely fall into this
category since it requires a huge amount of data. Thus the *should* P4
perform much better than the P3 for AI software.

Not to say that the P4 is a great chip design. Reading further it appears
that if the AI software was really performing well, the dam P4 would cut
back to 750MHz to reduce heat buildup!

>The point to all this being that if you are running an AI, which presumably
>will be churning on vast quantities of data, the clock speed doesn't matter.
>What the benchmarks between the P3 and P4 show is that memory bandwidth is
>already a serious crisis for data hungry apps. The situation on the P4 is
>so bad that they expect faster versions of the P4 to show no performance
>improvement for cache killers, whereas it used to be that you got *some*
>minor improvement with increased clock speeds even for these apps.

Increasing the MHz of the P4 would have little to no effect on "cache
killer" apps because they are bandwidth limited. But this does also
indicate that the P4 with its higher memory bandwidth (significantly higher
than Athalon) would perform the best. Plus, depending on which motherboard
the tests used the P4 can utilize dual RDRAM busses for double the memory
bandwidth. I wonder which one the benchmark used.

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:36 MDT