FYI:G4's for scientific computing (fwd)

From: Eugen Leitl (
Date: Mon Apr 15 2002 - 00:24:32 MDT

Most users were reporting very poor performance, with a few notable
exceptions. SIMD on 16x 8 bit integers sounds extremely useful at least
for me.

---------- Forwarded message ----------
Date: Sun, 14 Apr 2002 21:57:56 -0400 (EDT)
From: William R. Pearson <>
Subject: G4's for scientific computing

One of the advantages of the MacOSX gcc compiler is that in line
Altivec instructions are available at a high level. One can
define vector arrays, and do vector operations from 'C' code, e.g.

        while(vec_any_gt(T2, NAUGHT)) {
          T2 = vec_sub(LSHIFT(T2), RR);
          FF = vec_max(FF, T2);

We are testing an Altivec FASTA version; a Altivec BLAST was announced
several months ago. We like Altivec because we can manipulate 8
16-bit integers or 16 8 bit integers at once - biological sequence
comparison code is essentially all integer. We see a 6-fold speedups
on when things are done 8-fold parallel. On our codes a dual 533 G4
and Altivec code is 6X-faster than a dual 1 GHz PIII (we don't have a
GHz G4 yet). Because of the high level Altivec primitives in the
Apple gcc compiler, vectorizing was very very easy; we would have to
be much more sophisticated to do the same thing on the PIII (and the
potential speed-up would be 1/2 as large, since the vector is 64, not
128 bits).

I might have agreed with the statement that one must have hand-tuned
Altivec code which pretty much excludes general purpose scientific
computing 4 months ago, but our experience has been very positive -
our programs are not specialized signal processing programs, but, in
retrospect, it was easy to get very dramatic speed up.

Bill Pearson
Beowulf mailing list,
To change your subscription (digest mode or unsubscribe) visit

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:38 MDT