compressing short strings?

Thomas Troeger thomas.troeger.ext at siemens.com
Tue May 20 05:50:22 EDT 2008


Paul Rubin wrote:
> I have a lot of short English strings I'd like to compress in order to
> reduce the size of a database.  That is, I'd like a compression
> function that takes a string like (for example) "George Washington"

[...]

> 
> Thanks.

I think your idea is good, maybe you'd want to build an LZ78 encoder in 
Python (LZ78 is pretty easy), feed it with a long English text and then 
pickle the resulting object. You could then unpickle it on program start 
and encode your short strings with it. I bet there's a working 
implementation around that already that does it ... but if you can't 
find any, LZ78 is implemented in 1 or 2 hours. There was a rather good 
explanation of the algorithm in German, unfortunately it's vanished from 
the net recently (I have a backup if you're interested).

Cheers,
Thomas.



More information about the Python-list mailing list