Regex driving me crazy...

Steven D'Aprano steven at REMOVE.THIS.cybersource.com.au
Thu Apr 8 09:07:26 CEST 2010


On Wed, 07 Apr 2010 21:57:31 -0700, Patrick Maupin wrote:

> On Apr 7, 9:51 pm, Steven D'Aprano
> <ste... at REMOVE.THIS.cybersource.com.au> wrote:
> 
> BTW, I don't know how you got 'True' here.
> 
>> >>> re.split(' {2,}', s) == [x for x in s.split('  ') if x.strip()]
>> True


It was a copy and paste from the interactive interpreter. Here it is, in 
a fresh session:

[steve at wow-wow ~]$ python
Python 2.5 (r25:51908, Nov  6 2007, 16:54:01)
[GCC 4.1.2 20070925 (Red Hat 4.1.2-27)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> s = '# 1  Short offline       Completed without error       00%'
>>> re.split(' {2,}', s) == [x for x in s.split('  ') if x.strip()]
True
>>>


Now I copy-and-paste from your latest post to do it again:

>>> s = '# 1  Short offline       Completed without error       00%'
>>> re.split(' {2,}', s) == [x for x in s.split('  ') if x.strip()]
False


Weird, huh?

And here's the answer: somewhere along the line, something changed the 
whitespace in the string into non-spaces:

>>> s
'# 1 \xc2\xa0Short offline \xc2\xa0 \xc2\xa0 \xc2\xa0 Completed without 
error \xc2\xa0 \xc2\xa0 \xc2\xa0 00%'


I blame Google. I don't know how they did it, but I'm sure it was them!
*wink*


By the way, let's not forget that the string could be fixed-width fields 
padded with spaces, in which case the right solution almost certainly 
will be:

s = '# 1  Short offline       Completed without error       00%'
result = s[25:55].rstrip()

Even in 2010, there are plenty of programs that export data using fixed 
width fields.


-- 
Steven



More information about the Python-list mailing list