![](https://secure.gravatar.com/avatar/5615a372d9866f203a22b2c437527bbb.jpg?s=120&d=mm&r=g)
On Tue, Mar 29, 2022 at 11:00:41AM +0300, Serhiy Storchaka wrote:
28.03.22 15:13, StrikerOmega пише:
And I want to grab some kind of value from it.
There is a powerful tool designed for solving such problems. Is is called regular expressions.
sample.grab(start="fruit:", end="\n")
'apple'
re.search(r'fruit:(.*?)\n', sample)[1]
Now do grab(start="*", end="."). Of course you know how to do it, but a naive solution: re.search(r'*(.*?).', sample)[1] will fail. So now we have to learn about escaping characters in order to do a simple find-and-extract. And you need to memorise what characters have to be escaped, and if your start and end parameters are expressions or parameters rather than literals, the complexity goes up a lot: # Untested, so probably wrong. re.search(re.escape(start) + "(.*?)" + re.escape(end))[1] and we both know that many people won't bother with the escapes until they get bitten by bugs in their production code. And even then, regexes are a leading source of serious software vulnerabilities. https://cwe.mitre.org/data/definitions/185.html Yes, regular expressions can be used. We know that regexes can be used to solve most problems, for some definition of "solve". Including finding prime numbers: https://iluxonchik.github.io/regular-expression-check-if-number-is-prime/ A method can raise a useful, self-explanatory error message on failure. Your regex raises "TypeError: 'NoneType' object is not subscriptable". A method can be written to parse nested brackets correctly. A regular expression cannot. And then regexes are significantly slower:
sample = 'Hello world fruit: apple\n' setup = "from __main__ import grab, sample; import re" t_grab = Timer("grab(sample, 'fruit', '\\n')", setup=setup) t_regex = Timer("re.search(r'fruit:(.*?)\\n', sample)[1]", setup=setup) min(t_grab.repeat()) 0.47571489959955215 min(t_regex.repeat()) 0.8434272557497025
Here's the version of grab I used: def grab(text, start, end): a = text.index(start) b = text.index(end, a+len(start)) return text[a+len(start):b] I have no strong opinion on whether this simple function should be built into the string class, but I do have a strong opinion about re-writing it into a slower, more fragile, harder to understand, less user-friendly regex. Don't make me quote Jamie Zawinski again. -- Steve