[Python-Dev] Re: Alternative Implementation for PEP 292: SimpleString Substitutions

M.-A. Lemburg mal at egenix.com
Mon Sep 13 22:15:01 CEST 2004


Fredrik Lundh wrote:
> M.-A. Lemburg wrote:
> 
> 
>>You mean: a compressed shift table for Unicode patterns ?
>>I'll have a look.
> 
> 
> It's a lossy compression: the entire delta1 table is represented as
> two 32-bit values, independent of the size of the source alphabet.
> Works amazingly well, at least when combined with the BM-variant
> it was designed for...
> 
> (I suppose it's too late for 2.4, but it would probably be a good
> idea to switch to this algorithm in 2.5)

Here's a reference that might be interesting for you:

http://citeseer.ist.psu.edu/boldi02compact.html

They use statistical approaches to dealing with the problem of
large alphabets. Their motivation is making Java's Unicode string
implementation faster... sounds familiar, eh :-)

Their motivation was based on work done for the "Managing Gigabytes"
project:

http://www.cs.mu.oz.au/mg/

and

http://www.mds.rmit.edu.au/mg/

Too bad their code is GPLed, but I suppose getting some ideas
is OK ;-)

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Sep 13 2004)
 >>> Python/Zope Consulting and Support ...        http://www.egenix.com/
 >>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
 >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::


More information about the Python-Dev mailing list