
Hi all, I'm back on the regex module after doing other things and I'd like your opinion on a number of matters: Firstly, the current re module has a bug whereby it doesn't split on zero-width matches. The BDFL has said that this behaviour should be retained by default in case any existing software depends on it. My question is: should my regex module still do this for Python 3? Speaking personally, I'd like it to behave correctly, and Python 3 is the version where backwards-compatibility is allowed to be broken. Secondly, Python 2 is reaching the end of the line and Python 3 is the future. Should I still release a version that works with Python 2? I'm thinking that it could be confusing if new regex module did zero-width splits correctly in Python 3 but not in Python 2. And also, should I release it only for Python 3 as a 'carrot'? Finally, the module allows some extra backslash escapes, eg \g<name>, in the pattern. Should it treat ill-formed escapes, eg \g, as it would have treated them in the re module? Thanks

On 1/12/2010 5:10 PM, MRAB wrote:
Hi all,
I'm back on the regex module after doing other things and I'd like your opinion on a number of matters:
Firstly, the current re module has a bug whereby it doesn't split on zero-width matches. The BDFL has said that this behaviour should be retained by default in case any existing software depends on it. My question is: should my regex module still do this for Python 3? Speaking personally, I'd like it to behave correctly, and Python 3 is the version where backwards-compatibility is allowed to be broken.
Are you writing a new module with a new name? If so, do you expect it to replace or augment re? (This is the same question as for optparse vs. argparse, which I understand to not yet be decided.)
Secondly, Python 2 is reaching the end of the line and Python 3 is the future. Should I still release a version that works with Python 2? I'm thinking that it could be confusing if new regex module did zero-width splits correctly in Python 3 but not in Python 2. And also, should I release it only for Python 3 as a 'carrot'?
2.7 is in alpha with no plans for 2.8, so unless you finish real soon, 2.7 stdlib is already out. A new engine should get some community testing before going in the stdlib. Even 3.2 beta is not that far off (8-9 months?) Do *you* want to do the extra work for a 2.x release on PyPI?
Finally, the module allows some extra backslash escapes, eg \g<name>, in the pattern. Should it treat ill-formed escapes, eg \g, as it would have treated them in the re module?
What does re do with analogous cases? Terry Jan Reedy

Terry Reedy wrote:
On 1/12/2010 5:10 PM, MRAB wrote:
Hi all,
I'm back on the regex module after doing other things and I'd like your opinion on a number of matters:
Firstly, the current re module has a bug whereby it doesn't split on zero-width matches. The BDFL has said that this behaviour should be retained by default in case any existing software depends on it. My question is: should my regex module still do this for Python 3? Speaking personally, I'd like it to behave correctly, and Python 3 is the version where backwards-compatibility is allowed to be broken.
Are you writing a new module with a new name? If so, do you expect it to replace or augment re? (This is the same question as for optparse vs. argparse, which I understand to not yet be decided.)
It's a module called 'regex'. It can be used in place of 're' by using "import regex as re", except for differences such as "\g<name>" being a legal group reference in pattern strings.
Secondly, Python 2 is reaching the end of the line and Python 3 is the future. Should I still release a version that works with Python 2? I'm thinking that it could be confusing if new regex module did zero-width splits correctly in Python 3 but not in Python 2. And also, should I release it only for Python 3 as a 'carrot'?
2.7 is in alpha with no plans for 2.8, so unless you finish real soon, 2.7 stdlib is already out. A new engine should get some community testing before going in the stdlib. Even 3.2 beta is not that far off (8-9 months?) Do *you* want to do the extra work for a 2.x release on PyPI?
Finally, the module allows some extra backslash escapes, eg \g<name>, in the pattern. Should it treat ill-formed escapes, eg \g, as it would have treated them in the re module?
What does re do with analogous cases?
The 're' module treats r"\g" as "g"; both 're' and 'regex' treat, say, r"\q" as "q". The closest analogue to what I'm asking about is that re treats the ill-formed repeat r"x{1," as a literal, which sort of suggests that r"\g" should be treated as "g", but r"\g<name>" is now a group reference (re would treat that as "g<name>". Does that sound reasonable?

MRAB wrote:
Hi all,
I'm back on the regex module after doing other things and I'd like your opinion on a number of matters:
Firstly, the current re module has a bug whereby it doesn't split on zero-width matches. The BDFL has said that this behaviour should be retained by default in case any existing software depends on it. My question is: should my regex module still do this for Python 3? Speaking personally, I'd like it to behave correctly, and Python 3 is the version where backwards-compatibility is allowed to be broken.
Secondly, Python 2 is reaching the end of the line and Python 3 is the future. Should I still release a version that works with Python 2? I'm thinking that it could be confusing if new regex module did zero-width splits correctly in Python 3 but not in Python 2. And also, should I release it only for Python 3 as a 'carrot'?
Finally, the module allows some extra backslash escapes, eg \g<name>, in the pattern. Should it treat ill-formed escapes, eg \g, as it would have treated them in the re module?
I've just noticed something odd about the re module: the sub() method doesn't take 'pos' or 'endpos' arguments. search() does; match() does; findall() does(); finditer() does; but sub() doesn't. Maybe there has never been a demand for it. (Nor split(), for that matter.)

On Tue, Jan 12, 2010 at 14:10, MRAB <python@mrabarnett.plus.com> wrote:
Hi all,
I'm back on the regex module after doing other things and I'd like your opinion on a number of matters:
Firstly, the current re module has a bug whereby it doesn't split on zero-width matches. The BDFL has said that this behaviour should be retained by default in case any existing software depends on it. My question is: should my regex module still do this for Python 3? Speaking personally, I'd like it to behave correctly, and Python 3 is the version where backwards-compatibility is allowed to be broken.
If it is a separate module under a different name it can do the proper thing. People will just need to be aware of the difference when they import the module.
Secondly, Python 2 is reaching the end of the line and Python 3 is the future. Should I still release a version that works with Python 2? I'm thinking that it could be confusing if new regex module did zero-width splits correctly in Python 3 but not in Python 2. And also, should I release it only for Python 3 as a 'carrot'?
That's totally up to you. There is practically no chance of it getting into the 2.x under the stdlib at this point since 2.7b1 is coming up and this module has not been out in the wild for a year (to my knowledge). If you want to support 2.x that's fine and I am sure users would appreciate it, but it isn't necessary to get into the Python 3 stdlib.
Finally, the module allows some extra backslash escapes, eg \g<name>, in the pattern. Should it treat ill-formed escapes, eg \g, as it would have treated them in the re module?
If you want to minimize the differences then it should probably match. As I said, since it is a different name to import under it can deviate where reasonable, just make sure to clearly document the deviations. -Brett
Thanks _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/brett%40python.org

Memories of days past... Python had several regular expression implementations before, one of which was called "regex". But I would rather not have a new module -- I would much rather have a flag specifying the new (backwards incompatible) syntax/semantics. The flag would have a long name (e.g. re.NEW_SYNTAX), a short name (e.g. re.N) and an inline syntax, "(?n)...". --Guido On Tue, Jan 12, 2010 at 7:58 PM, Brett Cannon <brett@python.org> wrote:
On Tue, Jan 12, 2010 at 14:10, MRAB <python@mrabarnett.plus.com> wrote:
Hi all,
I'm back on the regex module after doing other things and I'd like your opinion on a number of matters:
Firstly, the current re module has a bug whereby it doesn't split on zero-width matches. The BDFL has said that this behaviour should be retained by default in case any existing software depends on it. My question is: should my regex module still do this for Python 3? Speaking personally, I'd like it to behave correctly, and Python 3 is the version where backwards-compatibility is allowed to be broken.
If it is a separate module under a different name it can do the proper thing. People will just need to be aware of the difference when they import the module.
Secondly, Python 2 is reaching the end of the line and Python 3 is the future. Should I still release a version that works with Python 2? I'm thinking that it could be confusing if new regex module did zero-width splits correctly in Python 3 but not in Python 2. And also, should I release it only for Python 3 as a 'carrot'?
That's totally up to you. There is practically no chance of it getting into the 2.x under the stdlib at this point since 2.7b1 is coming up and this module has not been out in the wild for a year (to my knowledge). If you want to support 2.x that's fine and I am sure users would appreciate it, but it isn't necessary to get into the Python 3 stdlib.
Finally, the module allows some extra backslash escapes, eg \g<name>, in the pattern. Should it treat ill-formed escapes, eg \g, as it would have treated them in the re module?
If you want to minimize the differences then it should probably match. As I said, since it is a different name to import under it can deviate where reasonable, just make sure to clearly document the deviations. -Brett
Thanks _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/brett%40python.org
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
-- --Guido van Rossum (python.org/~guido)
participants (4)
-
Brett Cannon
-
Guido van Rossum
-
MRAB
-
Terry Reedy