Re: De-Anthropomorphizing SL3 to SL4.

From: Martin Striz (
Date: Wed Mar 24 2004 - 11:06:26 MST

>From: Dani Eder <>

>The human genome is 3.36 billion base pairs, or
>840 MBytes raw. Using an efficient compression
>algorithm it can be compressed to 21% of it's raw
>size, hence there is 180 MBytes of unique data in
>the genome.

Unfortunately the case isn't that simple. Not all DNA is
information-coding, in fact, most of it isn't. Early estimates pinned the
amount of protein-coding DNA to about 10%. We now know now that it's more
like about 2%. But which part of the genome do you count as "information"?
Just the exons that make it to mature peptides and RNA? How about the
introns (with important intron/exon boundary and branchpoint sequences that
inform the splicing apparatus) and 5' and 3' untranslated regions? How
about promoter and enhancer sequences? What about sequences that don't
directly affect gene expression, but that get methylated or acetylated and
can thus affect chromatin structure, and indirectly expression of nearby
genes? How about AT/GC ratios? Do those count as information? What about
the selfish DNA sequences (transposons, etc.) that make copies of themselves
and act like genomic parasites? By inserting themselves into or near genes,
they can disrupt their sequence or expression.

It's all rather complex, and by the most liberal estimate perhaps 20% of the
genomic sequence caries information that is pertinent to the development and
function of H. sapiens. That's maybe 600 or 700 million base pairs (160
MB). I draw the distinction because you could have, say, a document that is
840 kb, but what if 80% of it is random, wordless strings of letters? You
wouldn't call that information.

Anyway, that's a more accurate number to use for the information content of
the human genome.

Martin Striz

All the action. All the drama. Get NCAA hoops coverage at MSN Sports by

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:46 MDT