[Tutor] re question
Ertl, John
john.ertl at fnmoc.navy.mil
Mon Mar 28 16:16:16 CEST 2005
All,
Thanks. I love this list...great freindly advice. I had taken a slightly
longer approach to Kent's "re.findall(r'[\d\.]+', s)" but the simplicity is
just too good to pass up. Jacob I too got the warning about encoding and
saved with the line added. It still would not strip out the charecter...I
may try a bit harder to make it work just becouse it bugs me that you made
it work.
Thanks again
John Ertl
-----Original Message-----
From: Kent Johnson
Cc: tutor at python.org
Sent: 3/27/05 2:31 PM
Subject: Re: [Tutor] re question
Jacob S. wrote:
> Kent -- when pulling out just the numbers, why go to the trouble of
> splitting by "," first?
Good question. It made sense at the time :-)
Here is another way using re.findall():
>>> import re
>>> s='Std Lvl: 850hPa, 1503m, 16.8C, 15.7C, 205 @ 11kts'
>>> re.findall(r'[\d\.]+', s)
['850', '1503', '16.8', '15.7', '205', '11']
Kent
>
> import re
> pat = re.compile(r"[^\d.]*")
>
> t = """SigWind: 857hPa, , 21.0C, 20.1C, 210 @ 9kts
> SigWind: 850hPa±, , , , 205 @ 11kts
> Std Lvl: 850hPa, 1503m, 16.8C, 15.7C, 205 @ 11kts"""
>
> result = pat.split(t)
> print result
>
> yields
>
> ['', '857', '21.0', '20.1', '210', '9', '850', '205', '11', '850',
> '1503', '16.8', '15.7', '205', '11', '']
>
> IDLE pops up with a dialog that says Non-ASCII found, yet no encoding
> declared. Add a line like
> # -*- coding: cp1252 -*-
> to your file
> Choose OK to save this file as cp1252
> Edit your general options to silence this warning
>
> It has buttons: Ok, Edit my file
> Edit my file adds the commented line above to the top of the script.
>
> Could this possibly be causing his problem?
>
> HTH,
> Jacob
>
>> I don't know why this isn't working for you but this worked for me at
>> a DOS console:
>> >>> s='850hPa±'
>> >>> s
>> '850hPa\xf1'
>> >>> import re
>> >>> re.sub('\xf1', '*', s)
>> '850hPa*'
>> >>> import sys
>> >>> sys.stdout.encoding
>> 'cp437'
>>
>> and also in IDLE with a different encoding:
>> >>> s='850hPa±'
>> >>> s
>> '850hPa\xb1'
>> >>> import re
>> >>> re.sub('\xb1', '*', s)
>> '850hPa*'
>> >>> import sys
>> >>> sys.stdout.encoding
>> 'cp1252'
>>
>> So one guess is that the data is in a different encoding than what
you
>> expect? When you print the string and get '\xb1', is that in the same
>> program that is doing the regex?
>>
>> Another approach would be to just pull out the numbers and ignore
>> everything else:
>> >>> s='Std Lvl: 850hPa, 1503m, 16.8C, 15.7C, 205 @ 11kts'
>> >>> l=s.split(',')
>> >>> l
>> ['Std Lvl: 850hPa', ' 1503m', ' 16.8C', ' 15.7C', ' 205 @
11kts']
>> >>> [ re.search(r'[\d\.]+', i).group() for i in l]
>> ['850', '1503', '16.8', '15.7', '205']
>>
>> Kent
>>
>> Ertl, John wrote:
>>
>>> All
>>>
>>> I have a string that has a bunch of numbers with the units attached
>>> to them.
>>> I want to strip off the units. I am using a regular expression and
>>> sub to
>>> do this. This works great for almost all of the cases. These are
>>> the type of lines:
>>>
>>> SigWind: 857hPa, , 21.0C, 20.1C, 210 @ 9kts
>>> SigWind: 850hPa±, , , , 205 @ 11kts
>>> Std Lvl: 850hPa, 1503m, 16.8C, 15.7C, 205 @ 11kts
>>>
>>> I am using the following cleanstring = re.compile(
>>> '(hPa|hPa\xb1|m|C|kts)'
>>> ). And then the cleanstring.sub("",line). I have tried using
>>> numerous \ to
>>> escape the \xb1.
>>>
>>> I also tried replacing all non numeric characters that are part of a
>>> number-character string but I could not make that work. The idea was
>>> replace
>>> all non-number characters in a "word" that is made up of numbers
>>> followed by
>>> numbers.
>>>
>>> I then split the line at the commas so in the current thinking I
need
>>> the
>>> commas for the split. How do I deal with the hPa±? When I print it
>>> out it
>>> looks like it is a hexadecimal escape character (\xb1) but I am note
>>> sure
>>> how to deal with this.
>>>
>>> Any ideas?
>>>
>>> Thanks
>>> _______________________________________________
>>> Tutor maillist - Tutor at python.org
>>> http://mail.python.org/mailman/listinfo/tutor
>>>
>>
>> _______________________________________________
>> Tutor maillist - Tutor at python.org
>> http://mail.python.org/mailman/listinfo/tutor
>>
>
>
_______________________________________________
Tutor maillist - Tutor at python.org
http://mail.python.org/mailman/listinfo/tutor
More information about the Tutor
mailing list