Prime number module

Lulu of the Lotus-Eaters mertz at gnosis.cx
Wed Oct 1 02:35:31 EDT 2003


|> I was disappointed that gzip only reduces the size by 41%.  I would have
|> guessed better, given the predominance of 0 bits.

Juha Autero <Juha.Autero at iki.fi> wrote previously:
|patterns and it's hardly a suprise that bitmap of prime numbers
|doesn't have lots of repeating bit patterns.

Sure it does.  The bit pattern "0 0 0 0 0 0 0" occurs a WHOLE LOT of
times :-).  Certainly much more than any other pattern.  After all, my
10^8/2 bits only contain 5,761,453 ones among them[*]--there's got to be
a lot of runs of zeros in there :-).

The problem, I think, is that these bit patterns don't fall on byte
borders as neatly as I would like.  I am pretty sure that I could write
a custom LZ-style compressor that did better by being bit-oriented
rather than byte-oriented.  But I'm not going to do that work, 3.8 megs
is small enough for me.

FWIW, bzip2 doesn't do much better either.  I'd be kinda surprised at
this point if another representation of the primes under 10^8 actually
did better[**].  I proposed listing the gaps as a strategy; but given
that there are 5.7 million gaps to list, it's hard to beat 3.8 megs.
Maybe encoding gaps in nibbles-with-escaping would work.  But now I'm
fond of the speed of the bit array.

Yours, Lulu...

[*] I'm missing two primes versus some other claims.  My odd numbers
don't include 2; and I called 1 a non-prime.  I don't feel strongly
about the second matter though.

[**] Well, a MUCH shorter representation is a Python-coded algorithm for
generating the primes in the first place.  But somehow that seems like
cheating :-).

--
---[ to our friends at TLAs (spread the word) ]--------------------------
Echelon North Korea Nazi cracking spy smuggle Columbia fissionable Stego
White Water strategic Clinton Delta Force militia TEMPEST Libya Mossad
---[ Postmodern Enterprises <mertz at gnosis.cx> ]--------------------------






More information about the Python-list mailing list