[Python-bugs-list] [ python-Bugs-599377 ] re searches don't work with 4-byte unico

SourceForge.net noreply@sourceforge.net
Sat, 14 Jun 2003 08:10:10 -0700


Bugs item #599377, was opened at 2002-08-23 21:16
Message generated for change (Comment added) made by loewis
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=599377&group_id=5470

Category: Python Library
Group: Python 2.2.1
>Status: Closed
>Resolution: Fixed
Priority: 5
Submitted By: Jim Fulton (dcjim)
>Assigned to: Martin v. Löwis (loewis)
Summary: re searches don't work with 4-byte unico

Initial Comment:
For Python 2.2.1 or the CVS head, as of this posting, 
with Python configured for 4-byte unicode
(--enable-unicode=ucs4)
searches against unicode regular expressions that use 
characters above \xff don't seem to work.

Here's an example:

  invalid_xml_char = re.compile(u'[\ud800-\udfff]')
  invalid_xml_char.search(u'\ud800')

returns None, rather than a match.


----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2003-06-14 17:10

Message:
Logged In: YES 
user_id=21627

This is now fixed for Python 2.3, with _sre.c 2.89.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-09-26 18:53

Message:
Logged In: YES 
user_id=21627

Added a work-around in sre_compile 1.44 and 1.41.14.2: it
disables big charsets for UCS-4 builds.

I leave this report open, so that a proper fix can be designed.


----------------------------------------------------------------------

Comment By: Peter Schneider-Kamp (nowonder)
Date: 2002-08-27 18:49

Message:
Logged In: YES 
user_id=14463

I could reproduce this behaviour exactly. No idea what is
causing it, though.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=599377&group_id=5470