Serhiy Storchaka wrote:

> My concern is that this will add complexity to the module documentation
> which is already too complex. re.findfirst() has more complex semantic
> (if no capture groups returns this, if one capture group return that,
> and in other cases return even something of different type) than
> re.search() which just returns a match object or None. This will
> increase chance that the user miss the appropriate function and use
> suboptimal functions like findall()[0].

> re.finditer() is more modern and powerful function than re.findall().
> The latter may be even deprecated in future.

Hmm, perhaps another consideration then would be to think of improvements to make to the existing documentation, particularly with including some code examples or expanding upon the docs for re.finditer() to make its usage more clear. Personally, it took me quite a while to understand its role in the module (as someone who does not use it on a frequent basis). Code examples should of course be used sparingly, but I think re.finditer() could benefit from at least one. Especially considering that far less complex functions in the module have several examples. See https://docs.python.org/3.8/library/re.html#re.finditer.

Serhiy Storchaka wrote:
> > Another option to consider might be adding a boolean parameter to
> > re.search() that changes the behavior to directly return a string
> > instead of a match object, similar to re.findall() when there are not
> > multiple subgroups.

> Oh, no, this is the worst idea!

Yeah, after having some time to reflect on that idea a bit more I don't think it would work. That would just end up adding confusion to re.search(), ultimately defeating the purpose of the parameter in the first place. It would be too drastic of a change in behavior for a single parameter to make.

Thanks for the honesty though, not all of my ideas are good ones. But, if I can come up with something half-decent every once in a while I think it's worth throwing them out there. (:

On Sat, Dec 7, 2019 at 2:56 AM Serhiy Storchaka <storchaka@gmail.com> wrote:

06.12.19 23:20, Kyle Stanley пише:
> Serhiy Storchaka wrote:
> > It seems that in most cases the author just do not know about
> > re.search(). Adding re.findfirst() will not fix this.
>
> That's definitely possible, but it might be just as likely that they saw
> re.findall() as being more simple to use compared to re.search().
> Although it has worse performance by a substantial amount when parsing
> decent amounts of text (assuming the first match isn't at the end),
> ``re.findall()[0]`` /consistently/ returns the first string that was
> matched, as long as no subgroups were used. This allows them to
> circumvent the usage of match objects entirely, which makes it a bit
> easier to learn. Especially for those who are less familiar with OOP, or
> are already familiar with other popular flavors of regex (such as JS).
>
> I'll admit this is mostly speculation, but I think there's an especially
> large number of re users (compared to other modules) that aren't
> necessarily developers, and might just be someone who wants to write a
> script to quickly parse some documents. These types of users are the
> ones who would likely benefit the most from the proposed re.findfirst(),
> particularly if it directly returns a string as Guido is suggesting.
>
> I think at the end of the day, the critical question to answer is this:
>
> *Do we want to add a new helper function that's easy to use, consistent,
> and provides good performance for finding the first match, even if the
> functionality already exists within the module?*

My concern is that this will add complexity to the module documentation
which is already too complex. re.findfirst() has more complex semantic
(if no capture groups returns this, if one capture group return that,
and in other cases return even something of different type) than
re.search() which just returns a match object or None. This will
increase chance that the user miss the appropriate function and use
suboptimal functions like findall()[0].

re.finditer() is more modern and powerful function than re.findall().
The latter may be even deprecated in future.

In future we may add yet few functions/methods: re.rmatch() (like
re.match(), but matches at the end of the string instead of the start),
re.rsearch() (searches from the end), re.rfinditer() (iterates in the
reversed order). Unlike to findfirst() they will implement features that
cannot be easily expressed using existing functions.

> Another option to consider might be adding a boolean parameter to
> re.search() that changes the behavior to directly return a string
> instead of a match object, similar to re.findall() when there are not
> multiple subgroups.

Oh, no, this is the worst idea!
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-leave@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/C4VUEDFVLRJ5G7KTDI5G5RNC3MMP7X6V/
Code of Conduct: http://python.org/psf/codeofconduct/