[Tutor] encoding question
Alex Kleider
akleider at sonic.net
Sat Jan 4 20:26:35 CET 2014
Any suggestions as to a better way to handle the problem of encoding in
the following context would be appreciated. The problem arose because
'Bogota' is spelt with an acute accent on the 'a'.
$ cat IP_info.py3
#!/usr/bin/env python3
# -*- coding : utf -8 -*-
# file: 'IP_info.py3' a module.
import urllib.request
url_format_str = \
'http://api.hostip.info/get_html.php?ip=%s&position=true'
def ip_info(ip_address):
"""
Returns a dictionary keyed by Country, City, Lat, Long and IP.
Depends on http://api.hostip.info (which returns the following:
'Country: UNITED STATES (US)\nCity: Santa Rosa, CA\n\nLatitude:
38.4486\nLongitude: -122.701\nIP: 76.191.204.54\n'.)
THIS COULD BREAK IF THE WEB SITE GOES AWAY!!!
"""
response = urllib.request.urlopen(url_format_str %\
(ip_address, )).read()
sp = response.splitlines()
country = city = lat = lon = ip = ''
for item in sp:
if item.startswith(b"Country:"):
try:
country = item[9:].decode('utf-8')
except:
print("Exception raised.")
country = item[9:]
elif item.startswith(b"City:"):
try:
city = item[6:].decode('utf-8')
except:
print("Exception raised.")
city = item[6:]
elif item.startswith(b"Latitude:"):
try:
lat = item[10:].decode('utf-8')
except:
print("Exception raised.")
lat = item[10]
elif item.startswith(b"Longitude:"):
try:
lon = item[11:].decode('utf-8')
except:
print("Exception raised.")
lon = item[11]
elif item.startswith(b"IP:"):
try:
ip = item[4:].decode('utf-8')
except:
print("Exception raised.")
ip = item[4:]
return {"Country" : country,
"City" : city,
"Lat" : lat,
"Long" : lon,
"IP" : ip }
if __name__ == "__main__":
addr = "201.234.178.62"
print (""" IP address is %(IP)s:
Country: %(Country)s; City: %(City)s.
Lat/Long: %(Lat)s/%(Long)s""" % ip_info(addr))
"""
The output I get on an Ubuntu 12.4LTS system is as follows:
alex at x301:~/Python/Parse$ ./IP_info.py3
Exception raised.
IP address is 201.234.178.62:
Country: COLOMBIA (CO); City: b'Bogot\xe1'.
Lat/Long: 10.4/-75.2833
I would have thought that utf-8 could handle the 'a-acute'.
Thanks,
alex
More information about the Tutor
mailing list