05.12.19 23:47, Kyle Stanley пише:
Serhiy Storchaka wrote:
We still do not know a use case for findfirst. If the OP would show his code and several examples in others code this could be an argument for usefulness of this feature.
I'm not sure about the OP's exact use case, but using GitHub's code search for .py files that match with "first re.findall" shows a decent amount of code that uses the format ``re.findall()[0]``. It would be nice if GitHub's search properly supported symbols and regular expressions, but this presents a decent number of examples. See https://github.com/search?l=Python&q=first+re.findall&type=Code.
I also spent some time looking for a few specific examples, since there were a number of false positives in the above results. Note that I didn't look much into the actual purpose of the code or judge it based on quality, I was just looking for anything that seemed remotely practical and contained something along the lines of ``re.findall()[0]``. Several of the links below contain multiple lines where findfirst would likely be a better alternative, but I only included one permalink per code file.
Thank you Kyle for your investigation!
https://github.com/MohamedAl-Hussein/my_projects/blob/15feca5254fe1b2936d393...
It is easy to rewrite it using re.search(). - input_processor=MapCompose(lambda x: re.findall(r'pointDRI = ([0-9]+)', x)[0], eval), + input_processor=MapCompose(lambda x: re.search(r'pointDRI = ([0-9]+)', x).group(1), eval), I also wonder if it is worth to replace eval with more efficient and safe int.
https://github.com/MohamedAl-Hussein/FIFA/blob/2b1390fe46f94648e5b0bcfd28bc6...
It is the same code differently formatted.
https://github.com/democracyworks/dog-catcher/blob/9f6200084d4505091399d36ab...
- clerk_name = name_re.findall(clerk)[0] + clerk_name = name_re.search(clerk).group(1)
https://github.com/democracyworks/dog-catcher/blob/9f6200084d4505091399d36ab...
- official_name = name_re.findall(town)[0].title() + official_name = name_re.search(town).group().title()
https://github.com/jessyL6/CQUPTHUB-spiders_task1/blob/db73c47c0703ed01eb2a6...
- first_1_results = re.findall(first_1,all_list9)[0] + first_1_results = re.findall(first_1,all_list9).group(1)
https://github.com/kerinin/giscrape/blob/d398206ed4a7e48e1ef6afbf37b4f98784c...
It is a complex example which performs multiple searches with different regular expressions. It is all can be replaced with a single more efficient regular expression. - if re.search('^(\w+) (\w+)$', parcel.owner): - last, first = re.findall( '(\w+) (\w+)',parcel.owner )[0] - elif re.search('^(\w+) (\w+) (\w+)$', parcel.owner): - last, first, middle = re.findall( '(\w+) (\w+) (\w+)',parcel.owner )[0] - elif re.search('^(\w+) (\w+) & (\w+)$', parcel.owner): - last, first = re.findall( '(\w+) (\w+)',parcel.owner )[0] - elif re.search('^(\w+) (\w+) (\w+) &: (\w+)$', parcel.owner): - last, first, middle = re.findall( '(\w+) (\w+) (\w+)',parcel.owner )[0] - elif re.search('^(\w+) (\w+) & (\w+) (\w+)$', parcel.owner): - last, first = re.findall( '(\w+) (\w+)',parcel.owner )[0] - elif re.search('^(\w+) (\w+) (\w+) &: (\w+) (\w+)$', parcel.owner): - last, first, middle = re.findall( '(\w+) (\w+) (\w+)',parcel.owner )[0] - elif re.search('^(\w+) (\w+) & (\w+) (\w+) (\w+)$', parcel.owner): - last, first = re.findall( '(\w+) (\w+)',parcel.owner )[0] - elif re.search('^(\w+) (\w+) (\w+) &: (\w+) (\w+) (\w+)$', parcel.owner): - last, first, middle = re.findall( '(\w+) (\w+) (\w+)', parcel.owner )[0] + m = re.fullmatch('(\w+) (\w+)(?: (\w+))?(?: &(?: \w+){1,3})?', parcel.owner) + if m: + last, first, middle = m.groups()
https://github.com/songweifun/parsebook/blob/529a86739208e9dc07abbb31363462e...
This is the only example which checks if findall() returns an empty list. It calls findall() twice! Fortunately it can be easily optimized using a fact that the Match object support subscription. I used group() above because it is more explicit and works in older Python. - self.item.first_tutor_name = REGPX_A.findall(value)[0] if REGPX_A.findall(value) else '' + self.item.first_tutor_name = (REGPX_A.search(value) or [''])[0] It seems that in most cases the author just do not know about re.search(). Adding re.findfirst() will not fix this.