problems with regex in Japanese?
Just van Rossum
just at letterror.com
Sat Aug 11 18:56:45 CEST 2001
Joe Strout wrote:
> In article <kusnezb0f0.fsf at lasipalatsi.fi>, Erno Kuusela
> <erno-news at erno.iki.fi> wrote:
> > || python no longer uses pcre, the pcre based regexp module
> > || was replaced by a new unicode-aware implementation called sre (written
> > || by Fredrik Lundh). sre is much faster too...
> > | Wow, I didn't know that. Where can I find out more about sre?
> > afraid i don't know of any docs on the internals. i think
> > the regex compiler is written in python, so you may need
> > to embed python if you plan to use it in another software package.
> OK, thanks again. We can't do that in our case, so I guess we'll just
> fix PCRE -- it seems to be 90% there anyway.
How would PCRE ever be able to match groups of characters above code
point 127, that are represented as more than one byte in utf-8? Or is
that a limitation you decided to live with?
More information about the Python-list