[Python-ideas] Give regex operations more sugar
kenlhilton at gmail.com
Wed Jun 13 07:06:09 EDT 2018
Regexes are really useful in many places, and to me it's sad to see the
builtin "re" module having to resort to requiring a source string as an
argument. It would be much more elegant to simply do "s.search(pattern)"
than "re.search(pattern, s)".
I suggest building all regex operations into the str class itself, as well
as a new syntax for regular expressions.
Thus a "findall" for any lowercase letter in a string would look like this:
['a', 'c', 'e', 'g', 'i']
A "findall" for any letter, case insensitive:
['A', 'c', 'E', 'g', 'I']
A substitution of any letter for the string " WOOF WOOF ":
>>> "1a3c5e7g9i".sub(!%[a-z]% WOOF WOOF %)
'1 WOOF WOOF 3 WOOF WOOF 5 WOOF WOOF 7 WOOF WOOF 9 WOOF WOOF '
A substitution of any letter, case insensitive, for the string "hovercraft":
You may wonder why I chose the regex delimiters as "!%" ... "%" [ ... "%" ]
The choice of "%" was purely arbitrary; I just thought of it since there
seems to be a convention to use "%" in PHP regex patterns. The "!" is in
front to disambiguate it from the "%" modulo operator or the "%" string
formatting operator, and because "!" is currently not used in Python.
Another potential idea is to simply use "!" to denote the start of a regex,
and use the character immediately following it to delimit the regex. Thus
all of the following would be regexes matching a single lowercase letter:
And all of the following would be substitution regexes replacing a single
case-insensitive letter with "@":
Some examples of how to use this:
['eu', 'o', 'ou', 'a', 'i', 'o', 'o', 'i', 'i', 'i', 'o', 'o', 'a',
'o', 'o', 'io', 'i']
<regex_match; span=(11, 20); match='qIQNlQSLi'>
>>> "My name is Joanne.".findall(!%[A-Z][a-z]+%)
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Python-ideas