[Python-Dev] Misc re.match() complaint

Guido van Rossum guido at python.org
Tue Jul 16 04:20:29 CEST 2013


On Mon, Jul 15, 2013 at 7:03 PM, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> Guido van Rossum writes:
>
>  > And I still think that any return type for group() except bytes or str
>  > is wrong. (Except possibly a subclass of these.)
>
> I'm not sure I understand.  Do you mean in the context of the match
> object API, where constructing "(target, match.start(), match.end())"
> to get a group-like object that refers to the target rather than
> copying the text is simple?  (Such objects are very useful in the
> restricted application of constructing a programmable text editor.)

I'm not sure I understand you. :-(

The group() method on the match object returned by re.match() and
re.search() returns a string-ish object representing the matched
substring. (I'm using "string-ish" to allow for both unicode and
bytes, which are exactly the two matching modes supported be the re
module.) In most contexts (text editors excluded) the program will use
this string just as it would use any other string, perhaps using it to
open a file, perhaps as a key into some cache, and so on.

I can clearly see the reasons why you want the target string to allow
other types besides str and bytes, in particular other things that are
known to represent sequences of bytes, such as bytearray and
memoryview. These reasons primarily have to do with optimizing the
representation of the target string in case it takes up a large amount
of memory, or other situations where we'd like to reduce the number of
times each byte is copied before we see it.

But I don't see as much of a use case for group() returning an object
of the same type as the target string. In particular in the case of a
target string that is a bytearray, group() has to copy the bytes
regardless of whether it creates a bytes or a bytearray instance. And
I do see a use case for group() returning an immutable object.

> Or is this something deeper, that a group *is* a new object in
> principle?

No, I just think of it as returning "a string" and I think it's most
useful if that is always an immutable object, even if the target
string is some other bytes buffer.

FWIW, it feels as if the change in behavior is probably just due to
how slices work.

-- 
--Guido van Rossum (python.org/~guido)


More information about the Python-Dev mailing list