[Python-ideas] Re: Fwd: Re: Fwd: re.findfirst()

6 Dec 2019

      05.12.19 23:47, Kyle Stanley пише:
...
Serhiy Storchaka wrote:
...
We still do not know a use case for findfirst. If the OP would show his
code and several examples in others code this could be an argument for
usefulness of this feature.
I'm not sure about the OP's exact use case, but using GitHub's code 
search for .py files that match with "first re.findall" shows a decent 
amount of code that uses the format ``re.findall()[0]``. It would be 
nice if GitHub's search properly supported symbols and regular 
expressions, but this presents a decent number of examples. See 
https://github.com/search?l=Python&q=first+re.findall&type=Code.
I also spent some time looking for a few specific examples, since there 
were a number of false positives in the above results. Note that I 
didn't look much into the actual purpose of the code or judge it based 
on quality, I was just looking for anything that seemed remotely 
practical and contained something along the lines of 
``re.findall()[0]``. Several of the links below contain multiple lines 
where findfirst would likely be a better alternative, but I only 
included one permalink per code file.
Thank you Kyle for your investigation!
...
https://github.com/MohamedAl-Hussein/my_projects/blob/15feca5254fe1b2936d393...
It is easy to rewrite it using re.search().

-         input_processor=MapCompose(lambda x: re.findall(r'pointDRI = 
([0-9]+)', x)[0], eval),
+         input_processor=MapCompose(lambda x: re.search(r'pointDRI = 
([0-9]+)', x).group(1), eval),

I also wonder if it is worth to replace eval with more efficient and 
safe int.
...
https://github.com/MohamedAl-Hussein/FIFA/blob/2b1390fe46f94648e5b0bcfd28bc6...
It is the same code differently formatted.
...
https://github.com/democracyworks/dog-catcher/blob/9f6200084d4505091399d36ab...
- 	clerk_name = name_re.findall(clerk)[0]
+ 	clerk_name = name_re.search(clerk).group(1)
...
https://github.com/democracyworks/dog-catcher/blob/9f6200084d4505091399d36ab...
-     official_name = name_re.findall(town)[0].title()
+     official_name = name_re.search(town).group().title()
...
https://github.com/jessyL6/CQUPTHUB-spiders_task1/blob/db73c47c0703ed01eb2a6...
-             first_1_results = re.findall(first_1,all_list9)[0]
+             first_1_results = re.findall(first_1,all_list9).group(1)
...
https://github.com/kerinin/giscrape/blob/d398206ed4a7e48e1ef6afbf37b4f98784c...
It is a complex example which performs multiple searches with different 
regular expressions. It is all can be replaced with a single more 
efficient regular expression.

-   if re.search('^(\w+) (\w+)$', parcel.owner):
-     last, first = re.findall( '(\w+) (\w+)',parcel.owner )[0]
-   elif re.search('^(\w+) (\w+) (\w+)$', parcel.owner):
-     last, first, middle = re.findall( '(\w+) (\w+) (\w+)',parcel.owner 
)[0]
-   elif re.search('^(\w+) (\w+) & (\w+)$', parcel.owner):
-     last, first = re.findall( '(\w+) (\w+)',parcel.owner )[0]
-   elif re.search('^(\w+) (\w+) (\w+) &: (\w+)$', parcel.owner):
-     last, first, middle = re.findall( '(\w+) (\w+) (\w+)',parcel.owner 
)[0]
-   elif re.search('^(\w+) (\w+) & (\w+) (\w+)$', parcel.owner):
-     last, first = re.findall( '(\w+) (\w+)',parcel.owner )[0]
-   elif re.search('^(\w+) (\w+) (\w+) &: (\w+) (\w+)$', parcel.owner):
-     last, first, middle = re.findall( '(\w+) (\w+) (\w+)',parcel.owner 
)[0]
-   elif re.search('^(\w+) (\w+) & (\w+) (\w+) (\w+)$', parcel.owner):
-     last, first = re.findall( '(\w+) (\w+)',parcel.owner )[0]
-   elif re.search('^(\w+) (\w+) (\w+) &: (\w+) (\w+) (\w+)$', 
parcel.owner):
-     last, first, middle = re.findall( '(\w+) (\w+) (\w+)', 
parcel.owner	 )[0]

+   m = re.fullmatch('(\w+) (\w+)(?: (\w+))?(?: &(?: \w+){1,3})?', 
parcel.owner)
+   if m:
+     last, first, middle = m.groups()
...
https://github.com/songweifun/parsebook/blob/529a86739208e9dc07abbb31363462e...
This is the only example which checks if findall() returns an empty 
list. It calls findall() twice! Fortunately it can be easily optimized 
using a fact that the Match object support subscription. I used group() 
above because it is more explicit and works in older Python.

-             self.item.first_tutor_name = REGPX_A.findall(value)[0] if 
REGPX_A.findall(value) else ''
+             self.item.first_tutor_name = (REGPX_A.search(value) or 
[''])[0]

It seems that in most cases the author just do not know about 
re.search(). Adding re.findfirst() will not fix this.