[Python-Dev] New Py_UNICODE doc

Wed May 11 10:25:46 CEST 2005

Martin v. Löwis wrote:
> M.-A. Lemburg wrote:
> 
>>If all you're interested in is the lexical class of the code points
>>in a string, you could use such a codec to map each code point
>>to a code point representing the lexical class.
> 
> 
> How can I efficiently implement such a codec? The whole point is doing
> that in pure Python (because if I had to write an extension module,
> I could just as well do the entire lexical analysis in C, without
> any regular expressions).

You can write such a codec in Python, but C will of course
be more efficient. The whole point is that for things that
you will likely use a lot in your application, it is better
to have one efficient implementation than dozens of
duplicate re character sets embedded in compiled re-expressions.

> Any kind of associative/indexed table for this task consumes a lot
> of memory, and takes quite some time to initialize.

Right - which is why an algorithmic approach will always
be more efficient (in terms of speed/memory tradeoff)
and these *can* support surrogates.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, May 11 2005)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::