<div dir="ltr"><br><div class="gmail_extra"><div class="gmail_quote">On Mon, Jul 15, 2013 at 4:14 PM, Guido van Rossum <span dir="ltr"><<a href="mailto:guido@python.org" target="_blank">guido@python.org</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">In a discussion about mypy I discovered that the Python 3 version of<br>
the re module's Match object behaves subtly different from the Python<br>
2 version when the target string (i.e. the haystack, not the needle)<br>
is a buffer object.<br>
<br>
In Python 2, the type of the return value of group() is always either<br>
a Unicode string or an 8-bit string, and the type is determined by<br>
looking at the target string -- if the target is unicode, group()<br>
returns a unicode string, otherwise, group() returns an 8-bit string.<br>
In particular, if the target is a buffer object, group() returns an<br>
8-bit string. I think this is the appropriate behavior: otherwise<br>
using regular expression matching to extract a small substring from a<br>
large target string would unnecessarily keep the large target string<br>
alive as long as the substring is alive.<br>
<br>
But in Python 3, the behavior of group() has changed so that its<br>
return type always matches that of the target string. I think this is<br>
bad -- apart from the lifetime concern, it means that if your target<br>
happens to be a bytearray, the return value isn't even hashable!<br>
<br>
Does anyone remember whether this was a conscious decision? Is it too<br>
late to fix?</blockquote><div><br></div><div>Hmm, that is not what I'd expect either. I would never expect it to return a bytearray; I'd normally assume that .group() returned a bytes object if the input was binary data and a str object if the input was unicode data (str) regardless of specific types containing the input target data.</div>
<div><br></div><div>I'm going to hazard a guess that not much, if anything, would be depending on getting a bytearray out of that. Fix this in 3.4? 3.3 and earlier users are stuck with an extra bytes() call and data copy in these cases I guess.</div>
<div><br></div><div>-gps</div><div><br></div></div></div></div>