[Python-Dev] New Py_UNICODE doc

M.-A. Lemburg mal at egenix.com
Tue May 10 11:19:20 CEST 2005


Martin v. Löwis wrote:
> M.-A. Lemburg wrote:
> 
>>On sre character classes: I don't think that these provide
>>a good approach to XML lexical classes - custom functions
>>or methods or maybe even a codec mapping the characters
>>to their XML lexical class are much more efficient in
>>practice.
> 
> 
> That isn't my experience: functions that scan XML strings
> are much slower than regular expressions.  I can't imagine
> how a custom codec could work, so I cannot comment on that.

If all you're interested in is the lexical class of the code points
in a string, you could use such a codec to map each code point
to a code point representing the lexical class. Then run re
as usual on the mapped Unicode string. Since the indices of
the matches found in the resulting string will be the same as
in the original string, it's easy to extract the corresponding
data from the original string.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, May 10 2005)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::


More information about the Python-Dev mailing list