[Python-Dev] New Py_UNICODE doc
M.-A. Lemburg
mal at egenix.com
Wed May 11 10:25:46 CEST 2005
Martin v. Löwis wrote:
> M.-A. Lemburg wrote:
>
>>If all you're interested in is the lexical class of the code points
>>in a string, you could use such a codec to map each code point
>>to a code point representing the lexical class.
>
>
> How can I efficiently implement such a codec? The whole point is doing
> that in pure Python (because if I had to write an extension module,
> I could just as well do the entire lexical analysis in C, without
> any regular expressions).
You can write such a codec in Python, but C will of course
be more efficient. The whole point is that for things that
you will likely use a lot in your application, it is better
to have one efficient implementation than dozens of
duplicate re character sets embedded in compiled re-expressions.
> Any kind of associative/indexed table for this task consumes a lot
> of memory, and takes quite some time to initialize.
Right - which is why an algorithmic approach will always
be more efficient (in terms of speed/memory tradeoff)
and these *can* support surrogates.
--
Marc-Andre Lemburg
eGenix.com
Professional Python Services directly from the Source (#1, May 11 2005)
>>> Python/Zope Consulting and Support ... http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
________________________________________________________________________
::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::
More information about the Python-Dev
mailing list