Python 2's re module should take longs

This works: re.search('(abc)', 'abc').group(1) but this doesn't: re.search('(abc)', 'abc').group(1L) The latter raises "IndexError: no such group". Shouldn't that technically work? -- Ryan If anybody ever asks me why I prefer C++ to C, my answer will be simple: "It's becauseslejfp23(@#Q*(E*EIdc-SEGFAULT. Wait, I don't think that was nul-terminated." Personal reality distortion fields are immune to contradictory evidence. - srean Check out my website: http://kirbyfan64.github.io/

On 9/30/2014 1:16 PM, Ryan Gonzalez wrote:
If groups were stored in a list, then technically, yes, not if groups are stored in a dict to support named groups with just one structure. Since the number of groups is limited to 99 or 100 in 2.7 (just changed for 3.5), there is no technical reason to use longs. Even if the exception were considered a bug, I would not change it since using longs would restrict code to 2.7.9+ and make it less portable to 3.x. -- Terry Jan Reedy

On Tue, Sep 30, 2014, at 15:31, Terry Reedy wrote:
If groups were stored in a list, then technically, yes, not if groups are stored in a dict to support named groups with just one structure.
Longs work fine interchangeably with ints in a dict. And even if they didn't, the group function _could_ convert a small-valued long argument to an int. This is an error raised by a function implemented in C that forces a static type checking on its arguments. The core problem is that the PyInt_AsLong function does not check (and handle) the case that its argument is a small-valued PyLong.

Disregard my last message, I was looking at the wrong code. But looking at what I think is the right code (https://hg.python.org/cpython/file/d49b9c8ee8ed/Modules), I am confused, since this error is raised after the index has already been converted to a Py_ssize_t. On Tue, Sep 30, 2014, at 16:06, random832@fastmail.us wrote:
-- Random832

On Tue, Sep 30, 2014 at 1:12 PM, <random832@fastmail.us> wrote:
According to my quick look at the code[1], it looks like the problem is in match_getindex (~line 3304). If the line "if (PyInt_Check(index))" read "if (PyInt_Check(index) || PyLong_Check(index))" instead, it appears that it would properly handle longs as well as ints (at least based on what is happening a little father down, near line 3312). It may be possible that the conditions need to be seperated so that the long case calls PyLong_AsSsize_t rather than PyInt_AsSsize_t, but that may not be needed. It appears that in case an index is passed in, the re module just converts that to a C size_t, otherwise it looks it up in the group name dictionary to get the index. I suspect the indexes don't exist as keys in the mapping, only the group names. As the initial conversion checks for int specifically, and ignores longs, longs are treated differently than ints. As a side note, it appears the documentation at https://docs.python.org/2/c-api/long.html is slightly incorrect: there appear to be two instances of a few functions, with slightly different documentation, but the same return, arguments, and name. The ones I can seeare "PyLong_FromSsize_t" and "PyLong_AsSsize_t". Prehaps I am just missing some subtle difference in the names or arguments? [1] https://hg.python.org/cpython/file/d49b9c8ee8ed/Modules/_sre.c

On Tue, Sep 30, 2014 at 09:57:44PM +0200, Antoine Pitrou wrote:
I'm not so sure that it's a bug. Should .group(1.0) work? That also is numerically equal to 1. MatchObject.group does not necessarily have to obey the semantics of list.index or dict.__getitem__. Just because ['a', 'b'].index(1L) and {1: 'b'}[1.0] both return 'b' doesn't force .group() to do the same. The documentation for .group is underspecified, and perhaps TypeError would be a more appropriate error rather than IndexError, but IndexError is consistent with other bad arguments: py> re.search('(abc)', 'abc').group([]) Traceback (most recent call last): File "<stdin>", line 1, in <module> IndexError: no such group So I don't think this is a bug, I think it is just an unfortunate choice of misleading exception type. The re module goes back to at least 1.5, and as far as I can tell, .group has never accepted longs. (I have tested it on 2.4 through 2.7, and 1.5, and it fails with all of them.) So this is a long-established restriction on the argument. (Another restriction is that a maximum of 99 groups are supported, so there are no cases where a long is needed.) Allowing longs is a new feature, not a bug fix. Since 2.7 is bug-fix only mode, and this fix is unneeded in 3.x, there is no point in raising an issue to the tracker. -- Steven

In article <20141001005156.GU19757@ando.pearwood.info>, Steven D'Aprano <steve@pearwood.info> wrote:
Since 2.7 is bug-fix only mode, and this fix is unneeded in 3.x, there is no point in raising an issue to the tracker.
It's a moot point now: http://bugs.python.org/issue22530 https://hg.python.org/cpython/rev/30f72ed73c3b -- Ned Deily, nad@acm.org

On Wed, 1 Oct 2014 10:51:57 +1000 Steven D'Aprano <steve@pearwood.info> wrote:
It's not about them being numerically equal, it's about them being integers (interchangeable, as Guido points out). We have been fixing many such bugs over the years.
So this is a long-established restriction on the argument.
Please don't try to second-guess the documentation when deciding what is a "long-established restriction". Regards Antoine.

On 9/30/2014 1:16 PM, Ryan Gonzalez wrote:
If groups were stored in a list, then technically, yes, not if groups are stored in a dict to support named groups with just one structure. Since the number of groups is limited to 99 or 100 in 2.7 (just changed for 3.5), there is no technical reason to use longs. Even if the exception were considered a bug, I would not change it since using longs would restrict code to 2.7.9+ and make it less portable to 3.x. -- Terry Jan Reedy

On Tue, Sep 30, 2014, at 15:31, Terry Reedy wrote:
If groups were stored in a list, then technically, yes, not if groups are stored in a dict to support named groups with just one structure.
Longs work fine interchangeably with ints in a dict. And even if they didn't, the group function _could_ convert a small-valued long argument to an int. This is an error raised by a function implemented in C that forces a static type checking on its arguments. The core problem is that the PyInt_AsLong function does not check (and handle) the case that its argument is a small-valued PyLong.

Disregard my last message, I was looking at the wrong code. But looking at what I think is the right code (https://hg.python.org/cpython/file/d49b9c8ee8ed/Modules), I am confused, since this error is raised after the index has already been converted to a Py_ssize_t. On Tue, Sep 30, 2014, at 16:06, random832@fastmail.us wrote:
-- Random832

On Tue, Sep 30, 2014 at 1:12 PM, <random832@fastmail.us> wrote:
According to my quick look at the code[1], it looks like the problem is in match_getindex (~line 3304). If the line "if (PyInt_Check(index))" read "if (PyInt_Check(index) || PyLong_Check(index))" instead, it appears that it would properly handle longs as well as ints (at least based on what is happening a little father down, near line 3312). It may be possible that the conditions need to be seperated so that the long case calls PyLong_AsSsize_t rather than PyInt_AsSsize_t, but that may not be needed. It appears that in case an index is passed in, the re module just converts that to a C size_t, otherwise it looks it up in the group name dictionary to get the index. I suspect the indexes don't exist as keys in the mapping, only the group names. As the initial conversion checks for int specifically, and ignores longs, longs are treated differently than ints. As a side note, it appears the documentation at https://docs.python.org/2/c-api/long.html is slightly incorrect: there appear to be two instances of a few functions, with slightly different documentation, but the same return, arguments, and name. The ones I can seeare "PyLong_FromSsize_t" and "PyLong_AsSsize_t". Prehaps I am just missing some subtle difference in the names or arguments? [1] https://hg.python.org/cpython/file/d49b9c8ee8ed/Modules/_sre.c

On Tue, Sep 30, 2014 at 09:57:44PM +0200, Antoine Pitrou wrote:
I'm not so sure that it's a bug. Should .group(1.0) work? That also is numerically equal to 1. MatchObject.group does not necessarily have to obey the semantics of list.index or dict.__getitem__. Just because ['a', 'b'].index(1L) and {1: 'b'}[1.0] both return 'b' doesn't force .group() to do the same. The documentation for .group is underspecified, and perhaps TypeError would be a more appropriate error rather than IndexError, but IndexError is consistent with other bad arguments: py> re.search('(abc)', 'abc').group([]) Traceback (most recent call last): File "<stdin>", line 1, in <module> IndexError: no such group So I don't think this is a bug, I think it is just an unfortunate choice of misleading exception type. The re module goes back to at least 1.5, and as far as I can tell, .group has never accepted longs. (I have tested it on 2.4 through 2.7, and 1.5, and it fails with all of them.) So this is a long-established restriction on the argument. (Another restriction is that a maximum of 99 groups are supported, so there are no cases where a long is needed.) Allowing longs is a new feature, not a bug fix. Since 2.7 is bug-fix only mode, and this fix is unneeded in 3.x, there is no point in raising an issue to the tracker. -- Steven

In article <20141001005156.GU19757@ando.pearwood.info>, Steven D'Aprano <steve@pearwood.info> wrote:
Since 2.7 is bug-fix only mode, and this fix is unneeded in 3.x, there is no point in raising an issue to the tracker.
It's a moot point now: http://bugs.python.org/issue22530 https://hg.python.org/cpython/rev/30f72ed73c3b -- Ned Deily, nad@acm.org

On Wed, 1 Oct 2014 10:51:57 +1000 Steven D'Aprano <steve@pearwood.info> wrote:
It's not about them being numerically equal, it's about them being integers (interchangeable, as Guido points out). We have been fixing many such bugs over the years.
So this is a long-established restriction on the argument.
Please don't try to second-guess the documentation when deciding what is a "long-established restriction". Regards Antoine.
participants (9)
-
Antoine Pitrou
-
Chris Kaynor
-
Georg Brandl
-
Ned Deily
-
random832@fastmail.us
-
Rob Cliffe
-
Ryan Gonzalez
-
Steven D'Aprano
-
Terry Reedy