[Python-Dev] unicode regex quickie: should a newline be the same thing as a linebreak?
Fredrik Lundh
Fredrik Lundh" <effbot@telia.com
Tue, 30 May 2000 12:26:29 +0200
I wrote:
> what's the best way to deal with this? I see three alter-
> natives:
>=20
> a) stick to the old definition, and use chr(10) also for
> unicode strings
>=20
> b) use different definitions for 8-bit strings and unicode
> strings; if given an 8-bit string, use chr(10); if given
> a 16-bit string, use the LINEBREAK predicate.
>=20
> c) use LINEBREAK in either case.
>=20
> I think (c) is the "right thing", but it's the only that may
> break existing code...
I'm probably getting old, but I don't remember if anyone followed
up on this, and I don't have time to check the archives right now.
so for the upcoming "feature complete" release, I've decided to
stick to (a).
...
for the next release, I suggest implementing a fourth alternative:
d) add a new unicode flag. if set, use LINEBREAK. otherwise,
use chr(10).
background: in the current implementation, this decision has to
be made at compile time, and a compiled expression can be used
with either 8-bit strings or 16-bit strings.
a fifth alternative would be to use the locale flag to tell the
difference between unicode and 8-bit characters:
e) if locale is not set, use LINEBREAK. otherwise, use chr(10).
comments?
</F>
<project name=3D"sre" phase=3D" complete=3D"97.1%" />