[Tutor] encoding question

Alex Kleider akleider at sonic.net
Sun Jan 5 21:14:10 CET 2014


On 2014-01-05 08:02, eryksun wrote:
> On Sun, Jan 5, 2014 at 2:57 AM, Alex Kleider <akleider at sonic.net> 
> wrote:
>> def ip_info(ip_address):
>> 
>>     response =  urllib2.urlopen(url_format_str %\
>>                                    (ip_address, ))
>>     encoding = response.headers.getparam('charset')
>>     print "'encoding' is '%s'." % (encoding, )
>>     info = unicode(response.read().decode(encoding))
> 
> decode() returns a unicode object.
> 
>>     n = info.find('\n')
>>     print "location of first newline is %s." % (n, )
>>     xml = info[n+1:]
>>     print "'xml' is '%s'." % (xml, )
>> 
>>     tree = ET.fromstring(xml)
>>     root = tree.getroot()   # Here's where it blows up!!!
>>     print "'root' is '%s', with the following children:" % (root, )
>>     for child in root:
>>         print child.tag, child.attrib
>>     print "END of CHILDREN"
>>     return info
> 
> Danny walked you through the XML. Note that he didn't decode the
> response. It includes an encoding on the first line:
> 
>     <?xml version="1.0" encoding="ISO-8859-1" ?>
> 
> Leave it to ElementTree. Here's something to get you started:
> 
>     import urllib2
>     import xml.etree.ElementTree as ET
>     import collections
> 
>     url_format_str = 'http://api.hostip.info/?ip=%s&position=true'
>     GML = 'http://www.opengis.net/gml'
>     IPInfo = collections.namedtuple('IPInfo', '''
>         ip
>         city
>         country
>         latitude
>         longitude
>     ''')
> 
>     def ip_info(ip_address):
>         response = urllib2.urlopen(url_format_str %
>                                    ip_address)
>         tree = ET.fromstring(response.read())
>         hostip = tree.find('{%s}featureMember/Hostip' % GML)
>         ip = hostip.find('ip').text
>         city = hostip.find('{%s}name' % GML).text
>         country = hostip.find('countryName').text
>         coord = hostip.find('.//{%s}coordinates' % GML).text
>         lon, lat = coord.split(',')
>         return IPInfo(ip, city, country, lat, lon)
> 
> 
>     >>> info = ip_info('201.234.178.62')
>     >>> info.ip
>     '201.234.178.62'
>     >>> info.city, info.country
>     (u'Bogot\xe1', 'COLOMBIA')
>     >>> info.latitude, info.longitude
>     ('10.4', '-75.2833')
> 
> This assumes everything works perfect. You have to decide how to fail
> gracefully for the service being unavailable or malformed XML
> (incomplete or corrupted response, etc).

Thanks again for the input.
You're using some ET syntax there that would probably make my code much 
more readable but will require a bit more study on my part.

I was up all night trying to get this sorted out and was finally 
successful.
(Re-) Reading 'joelonsoftware' and some of the Python docs helped.
Here's what I came up with (still needs modification to return a 
dictionary, but that'll be trivial.)

alex at x301:~/Python/Parse$ cat ip_xml.py
#!/usr/bin/env python
# vim: set fileencoding=utf-8 :
# -*- coding : utf-8 -*-
# file: 'ip_xml.py'

import urllib2
import xml.etree.ElementTree as ET


url_format_str = \
     u'http://api.hostip.info/?ip=%s&position=true'

def ip_info(ip_address):
     response =  urllib2.urlopen(url_format_str %\
                                    (ip_address, ))
     encoding = response.headers.getparam('charset')
     info = response.read().decode(encoding)
     # <info> comes in as <type 'unicode'>.
     n = info.find('\n')
     xml = info[n+1:]  # Get rid of a header line.
     # root = ET.fromstring(xml) # This causes error:
     # UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1'
     # in position 456: ordinal not in range(128)
     root = ET.fromstring(xml.encode("utf-8"))
     # This is the part I still don't fully understand but would
     # probably have to look at the library source to do so.
     info = []
     for i in range(4):
         info.append(root[3][0][i].text)
     info.append(root[3][0][4][0][0][0].text)

     return info

if __name__ == "__main__":
     info = ip_info("201.234.178.62")
     print info
     print info[1]

alex at x301:~/Python/Parse$ ./ip_xml.py
['201.234.178.62', u'Bogot\xe1', 'COLOMBIA', 'CO', '-75.2833,10.4']
Bogotá

Thanks to all who helped.
ak


More information about the Tutor mailing list