Regex driving me crazy...

Patrick Maupin pmaupin at gmail.com
Thu Apr 8 00:57:31 EDT 2010


On Apr 7, 9:51 pm, Steven D'Aprano
<ste... at REMOVE.THIS.cybersource.com.au> wrote:

BTW, I don't know how you got 'True' here.

> >>> re.split(' {2,}', s) == [x for x in s.split('  ') if x.strip()]
> True

You must not have s set up to be the string given by the OP.  I just
realized there was an error in my non-regexp example, that actually
manifests itself with the test data:

>>> import re
>>> s = '# 1  Short offline       Completed without error       00%'
>>> re.split(' {2,}', s)
['# 1', 'Short offline', 'Completed without error', '00%']
>>> [x for x in s.split('  ') if x.strip()]
['# 1', 'Short offline', ' Completed without error', ' 00%']
>>> re.split(' {2,}', s) == [x for x in s.split('  ') if x.strip()]
False

To fix it requires something like:

[x.strip() for x in s.split('  ') if x.strip()]

or:

[x for x in [x.strip() for x in s.split('  ')] if x]

I haven't timed either one of these, but given that the broken
original one was slower than the simpler:

splitter = re.compile(' {2,}').split
splitter(s)

on strings of "normal" length, and given that nobody noticed this bug
right away (even though it was in the printout on my first message,
heh), I think that this shows that (here, let me qualify this
carefully), at least in some cases, the first regexp that comes to my
mind can be prettier, shorter, faster, less bug-prone, etc. than the
first non-regexp that comes to my mind...

Regards,
Pat



More information about the Python-list mailing list