features of an AI code

From: Eugen Leitl (eugen@leitl.org)
Date: Thu Jul 08 2004 - 04:55:47 MDT

This is a stream of consciousness type of comments on good features of AI codes.
There are reasons for them, but they're usually omitted here. It's a curious
mix of current good practice and future (current lunatic fringe, but, it will
make sense then, trust me).

* code for today's/next year's commodity iron, but think about
  portability to massively parallel
  async molecular-electronics systems with mole amounts of switches.
  They're not so far away as you think. You might actually need them.

* your code should scale O(0) whether for 10^0 or 10^9 nodes (yes, it can be
  done with the right topology and communication pattern) -- if it doesn't
  you're doing someting really wrong. Screw Amdahl, no sequential sections

* at very large (WAN and Internet2) scale, use GRID

* use clusters, do not assume global connectivity (cubic primitive lattice is
  a safe assumption to make, 3d is safe, higher dimensionality is iffy)

* long-range is an iteration of short-range. This isn't inefficient if your
  time of flight is ~relativistic, and there are a couple gate delays in
  between -- this is for the future.

* if you signal, try streaming jumbo packets, and do crunch while you stream
  (if you can't, at least try to load all interfaces at the same time, then
  crunch a block, stream, crunch, etc.)

* this is especially true if your interconnect mesh isn't unobtainium
 (Infiniband & Co)

* use multiple NICs if you've got them, look at http://aggregate.org/ for
  inspiration (try to hit for 6 local crossbar links for each node, though)

* use a binary protocol, specifically a minimal subset of MPI (do NOT use XML over wire
  in local clusters, global networks is different)

* avoid writing to disk frequently, try to do without swap. There's a special
  case if you have a large library of patterns, and can stream sequentially,
  or tolerate 10 ms fetches sometimes. There won't be any disk in the future.

* try to go without local disk altogether but for checkpoints (not strictly
  necessary for future nonvolatile core)

* use a spatial system composition, communicate interface state to adjacent
  nodes (see cubic primitive lattice, usually 3d) -- see array

* use arrays of small data types, avoid absolute pointers (use relative
  addressing as offsets to current position)

* only access memory in stream mode, try to limit unpredictable accesses to
  1st/2nd level cache (here it actually pays to use assembly for prefetch,
  otherwise don't bother with assembly but maybe for inner loop when you're
  done. This will cut your harware budget by some 300%, ditto operation
  costs. A single guy for the hardware is enough, unless yours is a two-digit
  megabuck budget for iron alone.

* align objects to long and cache line boundaries

* make sure your objects could signal locally, and asynchronosly (no global clock
  or clock with jitter) -- this is different from streaming over an array
  with a hotspot loop. This is future, not today. Make sure your code will
  survive the future.

* don't use floats, stick to integers

* if you use integers, try using short integers (4-8 bit is great)

* stick to integers and codes you can crunch in parallel with MMX/SSE* type

* get ready for very wide words, low latency, small cores and FPGA (think
  inner loop in FPGA)

* make sure your inner loop would translate into logic some 10^3 gates simple
  for each individual small integer

* put the complexity into data pattern, not code pattern. Code is not that
  important, state is.

* if you think you can't do it in C, or don't need to, you're doing it wrong

* if can't describe the algorithm and the data flow on a large piece of paper,
  you're doing it wrong

* on the long run, get ready for crystalline hardware (moles of switches,
  local connectivity, nonvolatile state, relativistically constrained signalling)

If you can't state your problem in above framework, you're doing it the hard
way. If none of it appears relevant, and you in fact think I'm nuts then you won't
succeed. Probably.

Eugen* Leitl leitl
ICBM: 48.07078, 11.61144            http://www.leitl.org
8B29F6BE: 099D 78BA 2FD3 B014 B08A  7779 75B0 2443 8B29 F6BE
http://moleculardevices.org         http://nanomachines.net

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:47 MDT