[Python-Dev] Misc re.match() complaint

MRAB python at mrabarnett.plus.com
Tue Jul 16 02:10:51 CEST 2013


On 16/07/2013 00:30, Gregory P. Smith wrote:
>
> On Mon, Jul 15, 2013 at 4:14 PM, Guido van Rossum <guido at python.org
> <mailto:guido at python.org>> wrote:
>
>     In a discussion about mypy I discovered that the Python 3 version of
>     the re module's Match object behaves subtly different from the Python
>     2 version when the target string (i.e. the haystack, not the needle)
>     is a buffer object.
>
>     In Python 2, the type of the return value of group() is always either
>     a Unicode string or an 8-bit string, and the type is determined by
>     looking at the target string -- if the target is unicode, group()
>     returns a unicode string, otherwise, group() returns an 8-bit string.
>     In particular, if the target is a buffer object, group() returns an
>     8-bit string. I think this is the appropriate behavior: otherwise
>     using regular expression matching to extract a small substring from a
>     large target string would unnecessarily keep the large target string
>     alive as long as the substring is alive.
>
>     But in Python 3, the behavior of group() has changed so that its
>     return type always matches that of the target string. I think this is
>     bad -- apart from the lifetime concern, it means that if your target
>     happens to be a bytearray, the return value isn't even hashable!
>
>     Does anyone remember whether this was a conscious decision? Is it too
>     late to fix?
>
>
> Hmm, that is not what I'd expect either. I would never expect it to
> return a bytearray; I'd normally assume that .group() returned a bytes
> object if the input was binary data and a str object if the input was
> unicode data (str) regardless of specific types containing the input
> target data.
>
> I'm going to hazard a guess that not much, if anything, would be
> depending on getting a bytearray out of that. Fix this in 3.4? 3.3 and
> earlier users are stuck with an extra bytes() call and data copy in
> these cases I guess.
>
I'm not sure I understand the complaint.

I get this for Python 2.7:

Python 2.7.5 (default, May 15 2013, 22:43:36) [MSC v.1500 32 bit 
(Intel)] on win
32
Type "help", "copyright", "credits" or "license" for more information.
 >>> import array
 >>> import re
 >>> re.match(r"a", array.array("b", "a")).group()
array('b', [97])

It's the same even in Python 2.4.



More information about the Python-Dev mailing list