compressing short strings?
Helmut Jarausch
jarausch at igpm.rwth-aachen.de
Tue May 20 06:25:36 EDT 2008
Paul Rubin wrote:
> I have a lot of short English strings I'd like to compress in order to
> reduce the size of a database. That is, I'd like a compression
> function that takes a string like (for example) "George Washington"
> and returns a shorter string, with luck maybe 6 bytes or so. One
> obvious idea is take the gzip function, compress some large text
> corpus with it in streaming mode and throw away the output (but
> setting up the internal state to model the statistics of English
> text), then put in "George Washington" and treat the additional output
> as the compressed string. Obviously to get reasonable speed there
> would have to be a way to save the internal state after initializing
> from the corpus.
>
> Anyone know if this has been done and if there's code around for it?
> Maybe I'm better off with freezing a dynamic Markov model? I think
> there's DMM code around but am not sure where to look.
>
I'd ask in comp.compression where the specialists are listening and who are
very helpful.
--
Helmut Jarausch
Lehrstuhl fuer Numerische Mathematik
RWTH - Aachen University
D 52056 Aachen, Germany
More information about the Python-list
mailing list