[Tutor] Using xml.etree
Sander Sweers
sander.sweers at gmail.com
Mon Sep 19 22:20:12 CEST 2011
On 17/09/11 13:08, lists wrote:
> I have been trying to learn how to parse XML with Python and learn how
> to use xml.etree. Lots of the tutorials seem to be very long winded.
>
> I'm trying to access a UK postcode API at www.uk-postcodes.com to take
> a UK postcode and return the lat/lng of the postcode. This is what the
> XML looks like: http://www.uk-postcodes.com/postcode/HU11AA.xml
>
> The function below returns a dict with the xml tag as a key and the
> text as a value. Is this a correct way to use xml.etree?
Define correct, does it give the desired result? Then I would say yes it
is correct. There may be alternative ways to get to the same result though.
> def ukpostcodesapi(postcode):
> import urllib
Why do the import here, for speed? You are reading an xml file from the
internet, guess where most of the time is spend in your function ;-).
> import xml.etree.ElementTree as etree
>
> baseURL='http://www.uk-postcodes.com/'
> geocodeRequest='postcode/'+postcode+'.xml'
You could use string formatting here.
url = 'http://www.uk-postcodes.com/postcode/%s.xml' % postcode
Also what would happen if postcode includes a space?
>
> #grab the xml
> tree=etree.parse(urllib.urlopen(baseURL+geocodeRequest))
What happens if you get an error (a 404 error perhaps)? You might want
to add a try/except block around reading the xml from the internet.
> root=tree.getroot()
> results={}
> for child in root[1]: #here's the geo tag
> results.update({child.tag:child.text}) #build a dict containing the
> geocode data
> return results
As you only get 1 set of long/lat tags in the xml you could use find().
See below an example.
from xml.etree import ElementTree as ET
import urllib2
url = 'http://www.uk-postcodes.com/postcode/HU11AA.xml'
xml = urllib2.urlopen(url).read()
tree = ET.XML(xml)
geo = {}
for leaf in tree.find('geo'):
geo[leaf.tag] = leaf.text
Greets
Sander
More information about the Tutor
mailing list