[Tutor] re question
Jacob S.
keridee at jayco.net
Sun Mar 27 22:22:16 CEST 2005
Kent -- when pulling out just the numbers, why go to the trouble of
splitting by "," first?
import re
pat = re.compile(r"[^\d.]*")
t = """SigWind: 857hPa, , 21.0C, 20.1C, 210 @ 9kts
SigWind: 850hPa±, , , , 205 @ 11kts
Std Lvl: 850hPa, 1503m, 16.8C, 15.7C, 205 @ 11kts"""
result = pat.split(t)
print result
yields
['', '857', '21.0', '20.1', '210', '9', '850', '205', '11', '850', '1503',
'16.8', '15.7', '205', '11', '']
IDLE pops up with a dialog that says Non-ASCII found, yet no encoding
declared. Add a line like
# -*- coding: cp1252 -*-
to your file
Choose OK to save this file as cp1252
Edit your general options to silence this warning
It has buttons: Ok, Edit my file
Edit my file adds the commented line above to the top of the script.
Could this possibly be causing his problem?
HTH,
Jacob
>I don't know why this isn't working for you but this worked for me at a DOS
>console:
> >>> s='850hPa±'
> >>> s
> '850hPa\xf1'
> >>> import re
> >>> re.sub('\xf1', '*', s)
> '850hPa*'
> >>> import sys
> >>> sys.stdout.encoding
> 'cp437'
>
> and also in IDLE with a different encoding:
> >>> s='850hPa±'
> >>> s
> '850hPa\xb1'
> >>> import re
> >>> re.sub('\xb1', '*', s)
> '850hPa*'
> >>> import sys
> >>> sys.stdout.encoding
> 'cp1252'
>
> So one guess is that the data is in a different encoding than what you
> expect? When you print the string and get '\xb1', is that in the same
> program that is doing the regex?
>
> Another approach would be to just pull out the numbers and ignore
> everything else:
> >>> s='Std Lvl: 850hPa, 1503m, 16.8C, 15.7C, 205 @ 11kts'
> >>> l=s.split(',')
> >>> l
> ['Std Lvl: 850hPa', ' 1503m', ' 16.8C', ' 15.7C', ' 205 @ 11kts']
> >>> [ re.search(r'[\d\.]+', i).group() for i in l]
> ['850', '1503', '16.8', '15.7', '205']
>
> Kent
>
> Ertl, John wrote:
>> All
>>
>> I have a string that has a bunch of numbers with the units attached to
>> them.
>> I want to strip off the units. I am using a regular expression and sub
>> to
>> do this. This works great for almost all of the cases. These are the
>> type of lines:
>>
>> SigWind: 857hPa, , 21.0C, 20.1C, 210 @ 9kts
>> SigWind: 850hPa±, , , , 205 @ 11kts
>> Std Lvl: 850hPa, 1503m, 16.8C, 15.7C, 205 @ 11kts
>>
>> I am using the following cleanstring = re.compile(
>> '(hPa|hPa\xb1|m|C|kts)'
>> ). And then the cleanstring.sub("",line). I have tried using numerous \
>> to
>> escape the \xb1.
>>
>> I also tried replacing all non numeric characters that are part of a
>> number-character string but I could not make that work. The idea was
>> replace
>> all non-number characters in a "word" that is made up of numbers followed
>> by
>> numbers.
>>
>> I then split the line at the commas so in the current thinking I need the
>> commas for the split. How do I deal with the hPa±? When I print it out
>> it
>> looks like it is a hexadecimal escape character (\xb1) but I am note sure
>> how to deal with this.
>>
>> Any ideas?
>>
>> Thanks
>> _______________________________________________
>> Tutor maillist - Tutor at python.org
>> http://mail.python.org/mailman/listinfo/tutor
>>
>
> _______________________________________________
> Tutor maillist - Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
>
More information about the Tutor
mailing list