[Python-ideas] Re: A string function idea

30 Mar 2022

      On Tue, Mar 29, 2022 at 11:00:41AM +0300, Serhiy Storchaka wrote:
...
28.03.22 15:13, StrikerOmega пише:
...
And I want to grab some kind of value from it.
There is a powerful tool designed for solving such problems. Is is 
called regular expressions.
...
sample.grab(start="fruit:", end="\n")
...
...
'apple'
re.search(r'fruit:(.*?)\n', sample)[1]
Now do grab(start="*", end=".").

Of course you know how to do it, but a naive solution:

    re.search(r'*(.*?).', sample)[1]

will fail. So now we have to learn about escaping characters in order to 
do a simple find-and-extract. And you need to memorise what characters 
have to be escaped, and if your start and end parameters are expressions 
or parameters rather than literals, the complexity goes up a lot:

    # Untested, so probably wrong.
    re.search(re.escape(start) + "(.*?)" + re.escape(end))[1]

and we both know that many people won't bother with the escapes until 
they get bitten by bugs in their production code. And even then, regexes 
are a leading source of serious software vulnerabilities.

https://cwe.mitre.org/data/definitions/185.html

Yes, regular expressions can be used. We know that regexes can be used 
to solve most problems, for some definition of "solve". Including 
finding prime numbers:

https://iluxonchik.github.io/regular-expression-check-if-number-is-prime/

A method can raise a useful, self-explanatory error message on failure. 
Your regex raises "TypeError: 'NoneType' object is not subscriptable".

A method can be written to parse nested brackets correctly. A regular 
expression cannot.

And then regexes are significantly slower:
...
...
...
sample = 'Hello world fruit: apple\n'
setup = "from __main__ import grab, sample; import re"
t_grab = Timer("grab(sample, 'fruit', '\\n')", setup=setup)
t_regex = Timer("re.search(r'fruit:(.*?)\\n', sample)[1]", setup=setup)
min(t_grab.repeat())
0.47571489959955215
min(t_regex.repeat())
0.8434272557497025
Here's the version of grab I used:

def grab(text, start, end):
    a = text.index(start)
    b = text.index(end, a+len(start))
    return text[a+len(start):b]

I have no strong opinion on whether this simple function should be built 
into the string class, but I do have a strong opinion about re-writing 
it into a slower, more fragile, harder to understand, less user-friendly 
regex.

Don't make me quote Jamie Zawinski again.

-- 
Steve

[Python-ideas] Re: A string function idea

Steven D'Aprano