Regex driving me crazy...
Steven D'Aprano
steven at REMOVE.THIS.cybersource.com.au
Wed Apr 7 22:51:53 EDT 2010
On Wed, 07 Apr 2010 18:03:47 -0700, Patrick Maupin wrote:
> BTW, although I find it annoying when people say "don't do that" when
> "that" is a perfectly good thing to do, and although I also find it
> annoying when people tell you what not to do without telling you what
> *to* do,
Grant did give a perfectly good solution.
> and although I find the regex solution to this problem to be
> quite clean, the equivalent non-regex solution is not terrible, so I
> will present it as well, for your viewing pleasure:
>
> >>> [x for x in '# 1 Short offline Completed without error
> 00%'.split(' ') if x.strip()]
> ['# 1', 'Short offline', ' Completed without error', ' 00%']
This is one of the reasons we're so often suspicious of re solutions:
>>> s = '# 1 Short offline Completed without error 00%'
>>> tre = Timer("re.split(' {2,}', s)",
... "import re; from __main__ import s")
>>> tsplit = Timer("[x for x in s.split(' ') if x.strip()]",
... "from __main__ import s")
>>>
>>> re.split(' {2,}', s) == [x for x in s.split(' ') if x.strip()]
True
>>>
>>>
>>> min(tre.repeat(repeat=5))
6.1224789619445801
>>> min(tsplit.repeat(repeat=5))
1.8338048458099365
Even when they are correct and not unreadable line-noise, regexes tend to
be slow. And they get worse as the size of the input increases:
>>> s *= 1000
>>> min(tre.repeat(repeat=5, number=1000))
2.3496899604797363
>>> min(tsplit.repeat(repeat=5, number=1000))
0.41538596153259277
>>>
>>> s *= 10
>>> min(tre.repeat(repeat=5, number=1000))
23.739185094833374
>>> min(tsplit.repeat(repeat=5, number=1000))
4.6444299221038818
And this isn't even one of the pathological O(N**2) or O(2**N) regexes.
Don't get me wrong -- regexes are a useful tool. But if your first
instinct is to write a regex, you're doing it wrong.
[quote]
A related problem is Perl's over-reliance on regular expressions
that is exaggerated by advocating regex-based solution in almost
all O'Reilly books. The latter until recently were the most
authoritative source of published information about Perl.
While simple regular expression is a beautiful thing and can
simplify operations with string considerably, overcomplexity in
regular expressions is extremly dangerous: it cannot serve a basis
for serious, professional programming, it is fraught with pitfalls,
a big semantic mess as a result of outgrowing its primary purpose.
Diagnostic for errors in regular expressions is even weaker then
for the language itself and here many things are just go unnoticed.
[end quote]
http://www.softpanorama.org/Scripting/Perlbook/Ch01/
place_of_perl_among_other_lang.shtml
Even Larry Wall has criticised Perl's regex culture:
http://dev.perl.org/perl6/doc/design/apo/A05.html
--
Steven
More information about the Python-list
mailing list