problems with regex in Japanese?

Just van Rossum just at letterror.com
Sat Aug 11 18:56:45 CEST 2001


Joe Strout wrote:

> In article <kusnezb0f0.fsf at lasipalatsi.fi>, Erno Kuusela
> <erno-news at erno.iki.fi> wrote:
> 
> > || python no longer uses pcre, the pcre based regexp module
> > || was replaced by a new unicode-aware implementation called sre (written
> > || by Fredrik Lundh). sre is much faster too...
> >
> > | Wow, I didn't know that.  Where can I find out more about sre?
> >
> > afraid i don't know of any docs on the internals. i think
> > the regex compiler is written in python, so you may need
> > to embed python if you plan to use it in another software package.
> 
> OK, thanks again.  We can't do that in our case, so I guess we'll just
> fix PCRE -- it seems to be 90% there anyway.

How would PCRE ever be able to match groups of characters above code
point 127, that are represented as more than one byte in utf-8? Or is
that a limitation you decided to live with?

Just



More information about the Python-list mailing list