[Tutor] re question

Jacob S. keridee at jayco.net
Sun Mar 27 22:22:16 CEST 2005


Kent -- when pulling out just the numbers, why go to the trouble of 
splitting by "," first?

import re
pat = re.compile(r"[^\d.]*")

t =  """SigWind:  857hPa,          ,  21.0C,  20.1C, 210 @  9kts
SigWind:  850hPa±,         ,       ,       , 205 @ 11kts
Std Lvl:  850hPa,     1503m,  16.8C,  15.7C, 205 @ 11kts"""

result = pat.split(t)
print result

yields

['', '857', '21.0', '20.1', '210', '9', '850', '205', '11', '850', '1503', 
'16.8', '15.7', '205', '11', '']

IDLE pops up with a dialog that says Non-ASCII found, yet no encoding 
declared. Add a line like
# -*- coding: cp1252 -*-
to your file
Choose OK to save this file as cp1252
Edit your general options to silence this warning

It has buttons: Ok, Edit my file
Edit my file adds the commented line above to the top of the script.

Could this possibly be causing his problem?

HTH,
Jacob

>I don't know why this isn't working for you but this worked for me at a DOS 
>console:
>  >>> s='850hPa±'
>  >>> s
> '850hPa\xf1'
>  >>> import re
>  >>> re.sub('\xf1', '*', s)
> '850hPa*'
>  >>> import sys
>  >>> sys.stdout.encoding
> 'cp437'
>
> and also in IDLE with a different encoding:
> >>> s='850hPa±'
> >>> s
> '850hPa\xb1'
> >>> import re
> >>> re.sub('\xb1', '*', s)
> '850hPa*'
> >>> import sys
> >>> sys.stdout.encoding
> 'cp1252'
>
> So one guess is that the data is in a different encoding than what you 
> expect? When you print the string and get '\xb1', is that in the same 
> program that is doing the regex?
>
> Another approach would be to just pull out the numbers and ignore 
> everything else:
>  >>> s='Std Lvl:  850hPa,     1503m,  16.8C,  15.7C, 205 @ 11kts'
>  >>> l=s.split(',')
>  >>> l
> ['Std Lvl:  850hPa', '     1503m', '  16.8C', '  15.7C', ' 205 @ 11kts']
>  >>> [ re.search(r'[\d\.]+', i).group() for i in l]
> ['850', '1503', '16.8', '15.7', '205']
>
> Kent
>
> Ertl, John wrote:
>> All
>>
>> I have a string that has a bunch of numbers with the units attached to 
>> them.
>> I want to strip off the units.  I am using a regular expression and sub 
>> to
>> do this.  This works great for almost all of the cases.  These are the 
>> type of lines:
>>
>> SigWind:  857hPa,          ,  21.0C,  20.1C, 210 @  9kts
>> SigWind:  850hPa±,         ,       ,       , 205 @ 11kts
>> Std Lvl:  850hPa,     1503m,  16.8C,  15.7C, 205 @ 11kts
>>
>> I am using the following cleanstring = re.compile( 
>> '(hPa|hPa\xb1|m|C|kts)'
>> ).  And then the cleanstring.sub("",line).  I have tried using numerous \ 
>> to
>> escape the \xb1.
>>
>> I also tried replacing all non numeric characters that are part of a
>> number-character string but I could not make that work. The idea was 
>> replace
>> all non-number characters in a "word" that is made up of numbers followed 
>> by
>> numbers.
>>
>> I then split the line at the commas so in the current thinking I need the
>> commas for the split.  How do I deal with the hPa±?  When I print it out 
>> it
>> looks like it is a hexadecimal escape character (\xb1) but I am note sure
>> how to deal with this.
>>
>> Any ideas?
>>
>> Thanks
>> _______________________________________________
>> Tutor maillist  -  Tutor at python.org
>> http://mail.python.org/mailman/listinfo/tutor
>>
>
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
> 



More information about the Tutor mailing list