a simple unicode question

Scott David Daniels Scott.Daniels at Acm.Org
Tue Oct 20 11:03:20 EDT 2009


Mark Tolonen wrote:
>> Is there a better way of getting the degrees?
> 
> It seems your string is UTF-8.  \xc2\xb0 is UTF-8 for DEGREE SIGN.  If 
> you type non-ASCII characters in source code, make sure to declare the 
> encoding the file is *actually* saved in:
> 
> # coding: utf-8
> 
> s = '''48° 13' 16.80" N'''
> q = s.decode('utf-8')
> 
> # next line equivalent to previous two
> q = u'''48° 13' 16.80" N'''
> 
> # couple ways to find the degrees
> print int(q[:q.find(u'°')])
> import re
> print re.search(ur'(\d+)°',q).group(1)
> 

Mark is right about the source, but you needn't write unicode source
to process unicode data.  Since nobody else mentioned my favorite way
of writing unicode in ASCII, try:

IDLE 2.6.3
 >>> s = '''48\xc2\xb0 13' 16.80" N'''
 >>> q = s.decode('utf-8')
 >>> degrees, rest = q.split(u'\N{DEGREE SIGN}')
 >>> print degrees
48
 >>> print rest
  13' 16.80" N

And if you are unsure of the name to use:
 >>> import unicodedata
 >>> unicodedata.name(u'\xb0')
'DEGREE SIGN'

--Scott David Daniels
Scott.Daniels at Acm.Org



More information about the Python-list mailing list