[Tutor] encoding question

spir denis.spir at gmail.com
Sun Jan 5 10:55:51 CET 2014


On 01/04/2014 08:26 PM, Alex Kleider wrote:
> Any suggestions as to a better way to handle the problem of encoding in the
> following context would be appreciated.  The problem arose because 'Bogota' is
> spelt with an acute accent on the 'a'.
>
> $ cat IP_info.py3
> #!/usr/bin/env python3
> # -*- coding : utf -8 -*-
> # file: 'IP_info.py3'  a module.
>
> import urllib.request
>
> url_format_str = \
>      'http://api.hostip.info/get_html.php?ip=%s&position=true'
>
> def ip_info(ip_address):
>      """
> Returns a dictionary keyed by Country, City, Lat, Long and IP.
>
> Depends on http://api.hostip.info (which returns the following:
> 'Country: UNITED STATES (US)\nCity: Santa Rosa, CA\n\nLatitude:
> 38.4486\nLongitude: -122.701\nIP: 76.191.204.54\n'.)
> THIS COULD BREAK IF THE WEB SITE GOES AWAY!!!
> """
>      response =  urllib.request.urlopen(url_format_str %\
>                                     (ip_address, )).read()
>      sp = response.splitlines()
>      country = city = lat = lon = ip = ''
>      for item in sp:
>          if item.startswith(b"Country:"):
>              try:
>                  country = item[9:].decode('utf-8')
>              except:
>                  print("Exception raised.")
>                  country = item[9:]
>          elif item.startswith(b"City:"):
>              try:
>                  city = item[6:].decode('utf-8')
>              except:
>                  print("Exception raised.")
>                  city = item[6:]
>          elif item.startswith(b"Latitude:"):
>              try:
>                  lat = item[10:].decode('utf-8')
>              except:
>                  print("Exception raised.")
>                  lat = item[10]
>          elif item.startswith(b"Longitude:"):
>              try:
>                  lon = item[11:].decode('utf-8')
>              except:
>                  print("Exception raised.")
>                  lon = item[11]
>          elif item.startswith(b"IP:"):
>              try:
>                  ip = item[4:].decode('utf-8')
>              except:
>                  print("Exception raised.")
>                  ip = item[4:]
>      return {"Country" : country,
>              "City" : city,
>              "Lat" : lat,
>              "Long" : lon,
>              "IP" : ip            }
>
> if __name__ == "__main__":
>      addr =  "201.234.178.62"
>      print ("""    IP address is %(IP)s:
>          Country: %(Country)s;  City: %(City)s.
>          Lat/Long: %(Lat)s/%(Long)s""" % ip_info(addr))
> """
>
> The output I get on an Ubuntu 12.4LTS system is as follows:
> alex at x301:~/Python/Parse$ ./IP_info.py3
> Exception raised.
>      IP address is 201.234.178.62:
>          Country: COLOMBIA (CO);  City: b'Bogot\xe1'.
>          Lat/Long: 10.4/-75.2833
>
>
> I would have thought that utf-8 could handle the 'a-acute'.
>
> Thanks,
> alex

'รก' does not encode to 0xe1 in utf8 encoding; what you read is probably (legacy) 
files in probably latin-1 (or another latin-* encoding).

Denis


More information about the Tutor mailing list