Prime number module

Andrew Dalke adalke at mindspring.com
Wed Oct 1 06:07:41 EDT 2003


Lulu of the Lotus-Eaters:
> FWIW, bzip2 doesn't do much better either.  I'd be kinda surprised at
> this point if another representation of the primes under 10^8 actually
> did better[**].

As I recall, the Burrows-Wheeler transform used in bzip2 uses
a clever way to rearrange the data in a block into a form which is
more compressible, yes?

Well, what about rearranging the bits so that every multiple of
3 is before every non-multiple of 3, then every multiple of 5, etc.
There's a simple algorithm to unravel the ordering, and you'll
end up with lot of 0s at the start of the block.  It's very much like
the ordering+compression you did when you moved all the terms
divisible by 2 to the front of the group then realized they could
all be represented as part of the decompression algorithm.

Extend my idea far enough and you'll have a prime sieve.

> [**] Well, a MUCH shorter representation is a Python-coded algorithm for
> generating the primes in the first place.  But somehow that seems like
> cheating :-).

Why doesn't bzip2 feel like cheating?  It's good at compressing
text because it's designed for the sort of patterns found in text.
Why doesn't mp3 feel like cheating?  It's designed to handle the
sounds we normally hear, as compared to the output of a
random noise generator.

In essense, if you know the full corpus beforehand, it's really
easy to compress - just store the identifier for each text.
There's only one text here so the result is trivial compressed
to nothing.

The problem is, your algorithm tries to compress an ordered
list of odd numbers, and so fails to compress as well as a
compression algorithm tuned for just compressing the first
N primes.

> ---[ to our friends at TLAs (spread the word) ]--------------------------
> Echelon North Korea Nazi cracking spy smuggle Columbia fissionable Stego
> White Water strategic Clinton Delta Force militia TEMPEST Libya Mossad

Hmm... Seems dated.  Terms like whitewater and Clinton
(Bill, not Hillary) don't hit many hot buttons these days, the
number of people in various milita has gone way down [*]
and Libya is working towards renormalization.

What about these words?

Iran nuclear neocon expose POTUS patriot Pakistan armed weaponized
enriched uranium UN smallpox Gitmo invasion Castro Tikrit revolution sarin

                    Andrew
                    dalke at dalkescientific.com

[*]  Actually, US law states
http://www4.law.cornell.edu/uscode/10/311.html

(a) The militia of the United States consists of all able-bodied males at
least 17
years of age and, except as provided in section 313 of title 32, under 45
years of age who are, or who have made a declaration of intention to
become, citizens of the United States and of female citizens of the United
States who are members of the National Guard.

(b) The classes of the militia are -
  (1) the organized militia, which consists of the National Guard and
       the Naval Militia; and

  (2) the unorganized militia, which consists of the members of the
militia who are not members of the National Guard or the Naval Militia

Therefore, the numbers of people in the militia hasn't changed that
drastically -- and I'm a member of the unorganized militia of the US.
Maybe I should get a patch saying that, and wear it next time
I visit Europe?  :)






More information about the Python-list mailing list