Regex driving me crazy...
Steven D'Aprano
steven at REMOVE.THIS.cybersource.com.au
Thu Apr 8 03:07:26 EDT 2010
On Wed, 07 Apr 2010 21:57:31 -0700, Patrick Maupin wrote:
> On Apr 7, 9:51 pm, Steven D'Aprano
> <ste... at REMOVE.THIS.cybersource.com.au> wrote:
>
> BTW, I don't know how you got 'True' here.
>
>> >>> re.split(' {2,}', s) == [x for x in s.split(' ') if x.strip()]
>> True
It was a copy and paste from the interactive interpreter. Here it is, in
a fresh session:
[steve at wow-wow ~]$ python
Python 2.5 (r25:51908, Nov 6 2007, 16:54:01)
[GCC 4.1.2 20070925 (Red Hat 4.1.2-27)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> s = '# 1 Short offline Completed without error 00%'
>>> re.split(' {2,}', s) == [x for x in s.split(' ') if x.strip()]
True
>>>
Now I copy-and-paste from your latest post to do it again:
>>> s = '# 1 Short offline Completed without error 00%'
>>> re.split(' {2,}', s) == [x for x in s.split(' ') if x.strip()]
False
Weird, huh?
And here's the answer: somewhere along the line, something changed the
whitespace in the string into non-spaces:
>>> s
'# 1 \xc2\xa0Short offline \xc2\xa0 \xc2\xa0 \xc2\xa0 Completed without
error \xc2\xa0 \xc2\xa0 \xc2\xa0 00%'
I blame Google. I don't know how they did it, but I'm sure it was them!
*wink*
By the way, let's not forget that the string could be fixed-width fields
padded with spaces, in which case the right solution almost certainly
will be:
s = '# 1 Short offline Completed without error 00%'
result = s[25:55].rstrip()
Even in 2010, there are plenty of programs that export data using fixed
width fields.
--
Steven
More information about the Python-list
mailing list