# [Tutor] re question

Ertl, John john.ertl at fnmoc.navy.mil
Mon Mar 28 16:16:16 CEST 2005

```All,

Thanks.  I love this list...great freindly advice.  I had taken a slightly
longer approach to Kent's "re.findall(r'[\d\.]+', s)" but the simplicity is
just too good to pass up.  Jacob I too got the warning about encoding and
saved with the line added.  It still would not strip out the charecter...I
may try a bit harder to make it work just becouse it bugs me that you made
it work.

Thanks again

John Ertl

-----Original Message-----
From: Kent Johnson
Cc: tutor at python.org
Sent: 3/27/05 2:31 PM
Subject: Re: [Tutor] re question

Jacob S. wrote:
> Kent -- when pulling out just the numbers, why go to the trouble of
> splitting by "," first?

Good question. It made sense at the time :-)

Here is another way using re.findall():
>>> import re
>>> s='Std Lvl:  850hPa,     1503m,  16.8C,  15.7C, 205 @ 11kts'
>>> re.findall(r'[\d\.]+', s)
['850', '1503', '16.8', '15.7', '205', '11']

Kent
>
> import re
> pat = re.compile(r"[^\d.]*")
>
> t =  """SigWind:  857hPa,          ,  21.0C,  20.1C, 210 @  9kts
> SigWind:  850hPa±,         ,       ,       , 205 @ 11kts
> Std Lvl:  850hPa,     1503m,  16.8C,  15.7C, 205 @ 11kts"""
>
> result = pat.split(t)
> print result
>
> yields
>
> ['', '857', '21.0', '20.1', '210', '9', '850', '205', '11', '850',
> '1503', '16.8', '15.7', '205', '11', '']
>
> IDLE pops up with a dialog that says Non-ASCII found, yet no encoding
> declared. Add a line like
> # -*- coding: cp1252 -*-
> Choose OK to save this file as cp1252
> Edit your general options to silence this warning
>
> It has buttons: Ok, Edit my file
> Edit my file adds the commented line above to the top of the script.
>
> Could this possibly be causing his problem?
>
> HTH,
> Jacob
>
>> I don't know why this isn't working for you but this worked for me at

>> a DOS console:
>>  >>> s='850hPa±'
>>  >>> s
>> '850hPa\xf1'
>>  >>> import re
>>  >>> re.sub('\xf1', '*', s)
>> '850hPa*'
>>  >>> import sys
>>  >>> sys.stdout.encoding
>> 'cp437'
>>
>> and also in IDLE with a different encoding:
>> >>> s='850hPa±'
>> >>> s
>> '850hPa\xb1'
>> >>> import re
>> >>> re.sub('\xb1', '*', s)
>> '850hPa*'
>> >>> import sys
>> >>> sys.stdout.encoding
>> 'cp1252'
>>
>> So one guess is that the data is in a different encoding than what
you
>> expect? When you print the string and get '\xb1', is that in the same

>> program that is doing the regex?
>>
>> Another approach would be to just pull out the numbers and ignore
>> everything else:
>>  >>> s='Std Lvl:  850hPa,     1503m,  16.8C,  15.7C, 205 @ 11kts'
>>  >>> l=s.split(',')
>>  >>> l
>> ['Std Lvl:  850hPa', '     1503m', '  16.8C', '  15.7C', ' 205 @
11kts']
>>  >>> [ re.search(r'[\d\.]+', i).group() for i in l]
>> ['850', '1503', '16.8', '15.7', '205']
>>
>> Kent
>>
>> Ertl, John wrote:
>>
>>> All
>>>
>>> I have a string that has a bunch of numbers with the units attached
>>> to them.
>>> I want to strip off the units.  I am using a regular expression and
>>> sub to
>>> do this.  This works great for almost all of the cases.  These are
>>> the type of lines:
>>>
>>> SigWind:  857hPa,          ,  21.0C,  20.1C, 210 @  9kts
>>> SigWind:  850hPa±,         ,       ,       , 205 @ 11kts
>>> Std Lvl:  850hPa,     1503m,  16.8C,  15.7C, 205 @ 11kts
>>>
>>> I am using the following cleanstring = re.compile(
>>> '(hPa|hPa\xb1|m|C|kts)'
>>> ).  And then the cleanstring.sub("",line).  I have tried using
>>> numerous \ to
>>> escape the \xb1.
>>>
>>> I also tried replacing all non numeric characters that are part of a
>>> number-character string but I could not make that work. The idea was

>>> replace
>>> all non-number characters in a "word" that is made up of numbers
>>> followed by
>>> numbers.
>>>
>>> I then split the line at the commas so in the current thinking I
need
>>> the
>>> commas for the split.  How do I deal with the hPa±?  When I print it

>>> out it
>>> looks like it is a hexadecimal escape character (\xb1) but I am note

>>> sure
>>> how to deal with this.
>>>
>>> Any ideas?
>>>
>>> Thanks
>>> _______________________________________________
>>> Tutor maillist  -  Tutor at python.org
>>> http://mail.python.org/mailman/listinfo/tutor
>>>
>>
>> _______________________________________________
>> Tutor maillist  -  Tutor at python.org
>> http://mail.python.org/mailman/listinfo/tutor
>>
>
>

_______________________________________________
Tutor maillist  -  Tutor at python.org
http://mail.python.org/mailman/listinfo/tutor
```