[Python-Dev] re with Unicode broken?

Fredrik Lundh fredrik@pythonware.com
Fri, 13 Jul 2001 16:44:22 +0200


sjoerd wrote:

> This is not for the faint of heart.
>
> My validating XML parser doesn't work anymore, even though I didn't
> change a thing (except update Python from CVS).

when did you last update without problems?

the likely cause for this is MvL's "big char set" patch, which
I checked in on July 6.

here's a workaround: tweak sre_compile.py so it doesn't generate
BIGCHARSET op codes. in _optimize_charset, change this:

    except IndexError:
        # character set contains unicode characters
        return _optimize_unicode(charset, fixup)
    # compress character map

to

    except IndexError:
        # character set contains unicode characters
        return charset # WORKAROUND: no compression
    # compress character map

I'll look into this over the weekend.

Cheers /F