[Python-ideas] Re: Fwd: Re: Fwd: re.findfirst()

6 Dec 2019

      On 2019-12-06 18:24, Andrew Barnert via Python-ideas wrote:
...
On Dec 6, 2019, at 09:51, Random832  wrote:
...
If match objects are too hard to use, maybe they should be made more user-friendly? What about adding str and iterable semantics to match objects so it can be used as str(re.search(...)); tuple(re.search(...)); a, b = re.search(...)?
That’s a clever idea, and it might work.
1. Match objects are also be returned by re.match, and you wouldn't 
expect that to look for more matches.

2. What would tuple(re.search(...)) do? Wouldn't it do the same as 
tuple(re.findall(...))?

3. a, b = re.search(...) would fail if it didn't return exactly 2 
matches, and it would keep looking after the second match for a third 
match because that's how assigning from an iterator currently works - 
it's iterated until it's exhausted.
...
For iteration, the only question is what it returns when there’s only one capture group. If you do that with the findall entries you’ll get a tuple of the characters in the string, rather than a single-element tuple. I don’t think that’s behavior anyone would actually want for tuple(match) if we were designing the whole re module API from scratch. But would it be too inconsistent if you didn’t do it that way?
For string, str(match) already works, and sometimes provides useful debugging info. At the REPL this is probably no big deal (it’s easier to dump the repr than the str anyway), but what about logs? For example. I’ve got a parse error on a request, and my logs tell me the last successful match was <_sre.SRE_Match object; span=(21137, 21142), match='alpha'>, so I know to look around 21137 characters into the request to find the problem. After upgrading Python, the logs would just say alpha, which wouldn’t help me. I’d have to go change the code to log %r instead of %s (or, maybe, stop being so hacky and explicitly log the span and groups, and also log where the failed search started rather than guessing from the previous one, and make the parser give useful errors in the first place, etc.) before I could debug future requests. You’re not supposed to even rely on repr being consistent across Python implementations and versions, much less on str being developer- rather than user-friendly, but sometimes people do, and sometimes we all have to deal with their code. I don’t think this is a huge objection, but it is worth figuring out how often and how badly people would be affected.

[Python-ideas] Re: Fwd: Re: Fwd: re.findfirst()

MRAB