[Tutor] encoding question
Alex Kleider
akleider at sonic.net
Sun Jan 5 21:14:10 CET 2014
On 2014-01-05 08:02, eryksun wrote:
> On Sun, Jan 5, 2014 at 2:57 AM, Alex Kleider <akleider at sonic.net>
> wrote:
>> def ip_info(ip_address):
>>
>> response = urllib2.urlopen(url_format_str %\
>> (ip_address, ))
>> encoding = response.headers.getparam('charset')
>> print "'encoding' is '%s'." % (encoding, )
>> info = unicode(response.read().decode(encoding))
>
> decode() returns a unicode object.
>
>> n = info.find('\n')
>> print "location of first newline is %s." % (n, )
>> xml = info[n+1:]
>> print "'xml' is '%s'." % (xml, )
>>
>> tree = ET.fromstring(xml)
>> root = tree.getroot() # Here's where it blows up!!!
>> print "'root' is '%s', with the following children:" % (root, )
>> for child in root:
>> print child.tag, child.attrib
>> print "END of CHILDREN"
>> return info
>
> Danny walked you through the XML. Note that he didn't decode the
> response. It includes an encoding on the first line:
>
> <?xml version="1.0" encoding="ISO-8859-1" ?>
>
> Leave it to ElementTree. Here's something to get you started:
>
> import urllib2
> import xml.etree.ElementTree as ET
> import collections
>
> url_format_str = 'http://api.hostip.info/?ip=%s&position=true'
> GML = 'http://www.opengis.net/gml'
> IPInfo = collections.namedtuple('IPInfo', '''
> ip
> city
> country
> latitude
> longitude
> ''')
>
> def ip_info(ip_address):
> response = urllib2.urlopen(url_format_str %
> ip_address)
> tree = ET.fromstring(response.read())
> hostip = tree.find('{%s}featureMember/Hostip' % GML)
> ip = hostip.find('ip').text
> city = hostip.find('{%s}name' % GML).text
> country = hostip.find('countryName').text
> coord = hostip.find('.//{%s}coordinates' % GML).text
> lon, lat = coord.split(',')
> return IPInfo(ip, city, country, lat, lon)
>
>
> >>> info = ip_info('201.234.178.62')
> >>> info.ip
> '201.234.178.62'
> >>> info.city, info.country
> (u'Bogot\xe1', 'COLOMBIA')
> >>> info.latitude, info.longitude
> ('10.4', '-75.2833')
>
> This assumes everything works perfect. You have to decide how to fail
> gracefully for the service being unavailable or malformed XML
> (incomplete or corrupted response, etc).
Thanks again for the input.
You're using some ET syntax there that would probably make my code much
more readable but will require a bit more study on my part.
I was up all night trying to get this sorted out and was finally
successful.
(Re-) Reading 'joelonsoftware' and some of the Python docs helped.
Here's what I came up with (still needs modification to return a
dictionary, but that'll be trivial.)
alex at x301:~/Python/Parse$ cat ip_xml.py
#!/usr/bin/env python
# vim: set fileencoding=utf-8 :
# -*- coding : utf-8 -*-
# file: 'ip_xml.py'
import urllib2
import xml.etree.ElementTree as ET
url_format_str = \
u'http://api.hostip.info/?ip=%s&position=true'
def ip_info(ip_address):
response = urllib2.urlopen(url_format_str %\
(ip_address, ))
encoding = response.headers.getparam('charset')
info = response.read().decode(encoding)
# <info> comes in as <type 'unicode'>.
n = info.find('\n')
xml = info[n+1:] # Get rid of a header line.
# root = ET.fromstring(xml) # This causes error:
# UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1'
# in position 456: ordinal not in range(128)
root = ET.fromstring(xml.encode("utf-8"))
# This is the part I still don't fully understand but would
# probably have to look at the library source to do so.
info = []
for i in range(4):
info.append(root[3][0][i].text)
info.append(root[3][0][4][0][0][0].text)
return info
if __name__ == "__main__":
info = ip_info("201.234.178.62")
print info
print info[1]
alex at x301:~/Python/Parse$ ./ip_xml.py
['201.234.178.62', u'Bogot\xe1', 'COLOMBIA', 'CO', '-75.2833,10.4']
Bogotá
Thanks to all who helped.
ak
More information about the Tutor
mailing list