Python 2's re module should take longs
This works: re.search('(abc)', 'abc').group(1) but this doesn't: re.search('(abc)', 'abc').group(1L) The latter raises "IndexError: no such group". Shouldn't that technically work? -- Ryan If anybody ever asks me why I prefer C++ to C, my answer will be simple: "It's becauseslejfp23(@#Q*(E*EIdc-SEGFAULT. Wait, I don't think that was nul-terminated." Personal reality distortion fields are immune to contradictory evidence. - srean Check out my website: http://kirbyfan64.github.io/
On 9/30/2014 1:16 PM, Ryan Gonzalez wrote:
This works:
re.search('(abc)', 'abc').group(1)
but this doesn't:
re.search('(abc)', 'abc').group(1L)
The latter raises "IndexError: no such group". Shouldn't that technically work?
If groups were stored in a list, then technically, yes, not if groups are stored in a dict to support named groups with just one structure. Since the number of groups is limited to 99 or 100 in 2.7 (just changed for 3.5), there is no technical reason to use longs. Even if the exception were considered a bug, I would not change it since using longs would restrict code to 2.7.9+ and make it less portable to 3.x. -- Terry Jan Reedy
On Tue, Sep 30, 2014, at 15:31, Terry Reedy wrote:
If groups were stored in a list, then technically, yes, not if groups are stored in a dict to support named groups with just one structure.
Longs work fine interchangeably with ints in a dict. And even if they didn't, the group function _could_ convert a small-valued long argument to an int. This is an error raised by a function implemented in C that forces a static type checking on its arguments. The core problem is that the PyInt_AsLong function does not check (and handle) the case that its argument is a small-valued PyLong.
Disregard my last message, I was looking at the wrong code. But looking at what I think is the right code (https://hg.python.org/cpython/file/d49b9c8ee8ed/Modules), I am confused, since this error is raised after the index has already been converted to a Py_ssize_t. On Tue, Sep 30, 2014, at 16:06, random832@fastmail.us wrote:
On Tue, Sep 30, 2014, at 15:31, Terry Reedy wrote:
If groups were stored in a list, then technically, yes, not if groups are stored in a dict to support named groups with just one structure.
Longs work fine interchangeably with ints in a dict. And even if they didn't, the group function _could_ convert a small-valued long argument to an int. This is an error raised by a function implemented in C that forces a static type checking on its arguments. The core problem is that the PyInt_AsLong function does not check (and handle) the case that its argument is a small-valued PyLong. _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- Random832
On Tue, Sep 30, 2014 at 1:12 PM,
Disregard my last message, I was looking at the wrong code.
But looking at what I think is the right code (https://hg.python.org/cpython/file/d49b9c8ee8ed/Modules), I am confused, since this error is raised after the index has already been converted to a Py_ssize_t.
According to my quick look at the code[1], it looks like the problem is in match_getindex (~line 3304). If the line "if (PyInt_Check(index))" read "if (PyInt_Check(index) || PyLong_Check(index))" instead, it appears that it would properly handle longs as well as ints (at least based on what is happening a little father down, near line 3312). It may be possible that the conditions need to be seperated so that the long case calls PyLong_AsSsize_t rather than PyInt_AsSsize_t, but that may not be needed. It appears that in case an index is passed in, the re module just converts that to a C size_t, otherwise it looks it up in the group name dictionary to get the index. I suspect the indexes don't exist as keys in the mapping, only the group names. As the initial conversion checks for int specifically, and ignores longs, longs are treated differently than ints. As a side note, it appears the documentation at https://docs.python.org/2/c-api/long.html is slightly incorrect: there appear to be two instances of a few functions, with slightly different documentation, but the same return, arguments, and name. The ones I can seeare "PyLong_FromSsize_t" and "PyLong_AsSsize_t". Prehaps I am just missing some subtle difference in the names or arguments? [1] https://hg.python.org/cpython/file/d49b9c8ee8ed/Modules/_sre.c
On 09/30/2014 10:32 PM, Chris Kaynor wrote:
As a side note, it appears the documentation at https://docs.python.org/2/c-api/long.html is slightly incorrect: there appear to be two instances of a few functions, with slightly different documentation, but the same return, arguments, and name. The ones I can seeare "PyLong_FromSsize_t" and "PyLong_AsSsize_t". Prehaps I am just missing some subtle difference in the names or arguments?
It just looks like a duplication, maybe from editing a merge conflict. Fixed. Georg
On Tue, 30 Sep 2014 12:16:13 -0500
Ryan Gonzalez
This works:
re.search('(abc)', 'abc').group(1)
but this doesn't:
re.search('(abc)', 'abc').group(1L)
The latter raises "IndexError: no such group". Shouldn't that technically work?
Yes, it's a bug. Feel free to open an issue. Regards Antoine.
On Tue, Sep 30, 2014 at 09:57:44PM +0200, Antoine Pitrou wrote:
On Tue, 30 Sep 2014 12:16:13 -0500 Ryan Gonzalez
wrote: This works:
re.search('(abc)', 'abc').group(1)
but this doesn't:
re.search('(abc)', 'abc').group(1L)
The latter raises "IndexError: no such group". Shouldn't that technically work?
Yes, it's a bug. Feel free to open an issue.
I'm not so sure that it's a bug. Should .group(1.0) work? That also is numerically equal to 1. MatchObject.group does not necessarily have to obey the semantics of list.index or dict.__getitem__. Just because ['a', 'b'].index(1L) and {1: 'b'}[1.0] both return 'b' doesn't force .group() to do the same. The documentation for .group is underspecified, and perhaps TypeError would be a more appropriate error rather than IndexError, but IndexError is consistent with other bad arguments: py> re.search('(abc)', 'abc').group([]) Traceback (most recent call last): File "<stdin>", line 1, in <module> IndexError: no such group So I don't think this is a bug, I think it is just an unfortunate choice of misleading exception type. The re module goes back to at least 1.5, and as far as I can tell, .group has never accepted longs. (I have tested it on 2.4 through 2.7, and 1.5, and it fails with all of them.) So this is a long-established restriction on the argument. (Another restriction is that a maximum of 99 groups are supported, so there are no cases where a long is needed.) Allowing longs is a new feature, not a bug fix. Since 2.7 is bug-fix only mode, and this fix is unneeded in 3.x, there is no point in raising an issue to the tracker. -- Steven
In article <20141001005156.GU19757@ando.pearwood.info>,
Steven D'Aprano
Since 2.7 is bug-fix only mode, and this fix is unneeded in 3.x, there is no point in raising an issue to the tracker.
It's a moot point now: http://bugs.python.org/issue22530 https://hg.python.org/cpython/rev/30f72ed73c3b -- Ned Deily, nad@acm.org
On Tue, Sep 30, 2014 at 07:31:24PM -0700, Ned Deily wrote:
In article <20141001005156.GU19757@ando.pearwood.info>, Steven D'Aprano
wrote: Since 2.7 is bug-fix only mode, and this fix is unneeded in 3.x, there is no point in raising an issue to the tracker.
It's a moot point now:
Ack; Guido has spoken, and his logic is impecable. -- Steven
On 01/10/2014 03:35, Steven D'Aprano wrote:
On Tue, Sep 30, 2014 at 07:31:24PM -0700, Ned Deily wrote:
In article <20141001005156.GU19757@ando.pearwood.info>, Steven D'Aprano
wrote: Since 2.7 is bug-fix only mode, and this fix is unneeded in 3.x, there is no point in raising an issue to the tracker. It's a moot point now: Ack; Guido has spoken, and his logic is impecable.
Unlike your spelling.
On Wed, 1 Oct 2014 10:51:57 +1000
Steven D'Aprano
On Tue, Sep 30, 2014 at 09:57:44PM +0200, Antoine Pitrou wrote:
On Tue, 30 Sep 2014 12:16:13 -0500 Ryan Gonzalez
wrote: This works:
re.search('(abc)', 'abc').group(1)
but this doesn't:
re.search('(abc)', 'abc').group(1L)
The latter raises "IndexError: no such group". Shouldn't that technically work?
Yes, it's a bug. Feel free to open an issue.
I'm not so sure that it's a bug. Should .group(1.0) work? That also is numerically equal to 1.
It's not about them being numerically equal, it's about them being integers (interchangeable, as Guido points out). We have been fixing many such bugs over the years.
So this is a long-established restriction on the argument.
Please don't try to second-guess the documentation when deciding what is a "long-established restriction". Regards Antoine.
participants (9)
-
Antoine Pitrou
-
Chris Kaynor
-
Georg Brandl
-
Ned Deily
-
random832@fastmail.us
-
Rob Cliffe
-
Ryan Gonzalez
-
Steven D'Aprano
-
Terry Reedy