Code improvement question
Thomas Passin
list1 at tompassin.net
Fri Nov 17 07:48:41 EST 2023
On 11/17/2023 6:17 AM, Peter J. Holzer via Python-list wrote:
> On 2023-11-16 11:34:16 +1300, Rimu Atkinson via Python-list wrote:
>>>> Why don't you use re.findall?
>>>>
>>>> re.findall(r'\b[0-9]{2,7}-[0-9]{2}-[0-9]{2}\b', txt)
>>>
>>> I think I can see what you did there but it won't make sense to me - or
>>> whoever looks at the code - in future.
>>>
>>> That answers your specific question. However, I am in awe of people who
>>> can just "do" regular expressions and I thank you very much for what
>>> would have been a monumental effort had I tried it.
>>
>> I feel the same way about regex. If I can find a way to write something
>> without regex I very much prefer to as regex usually adds complexity and
>> hurts readability.
>
> I find "straight" regexps very easy to write. There are only a handful
> of constructs which are all very simple and you just string them
> together. But then I've used regexps for 30+ years, so of course they
> feel natural to me.
>
> (Reading regexps may be a bit harder, exactly because they are to
> simple: There is no abstraction, so a complicated pattern results in a
> long regexp.)
>
> There are some extensions to regexps which are conceptually harder, like
> lookahead and lookbehind or nested contexts in Perl. I may need the
> manual for those (especially because they are new(ish) and every
> language uses a different syntax for them) or avoid them altogether.
>
> Oh, and Python (just like Perl) allows you to embed whitespace and
> comments into Regexps, which helps readability a lot if you have to
> write long regexps.
>
>
>> You might find https://regex101.com/ to be useful for testing your regex.
>> You can enter in sample data and see if it matches.
>>
>> If I understood what your regex was trying to do I might be able to suggest
>> some python to do the same thing. Is it just removing numbers from text?
>
> Not "removing" them (as I understood it), but extracting them (i.e. find
> and collect them).
>
>>>> re.findall(r'\b[0-9]{2,7}-[0-9]{2}-[0-9]{2}\b', txt)
>
> \b - a word boundary.
> [0-9]{2,7} - 2 to 7 digits
> - - a hyphen-minus
> [0-9]{2} - exactly 2 digits
> - - a hyphen-minus
> [0-9]{2} - exactly 2 digits
> \b - a word boundary.
>
> Seems quite straightforward to me. I'll be impressed if you can write
> that in Python in a way which is easier to read.
And the re.VERBOSE (also re.X) flag can always be used so the entire
expression can be written line-by-line with comments nearly the same as
the example above
More information about the Python-list
mailing list