[Tutor] Using xml.etree

Mon Sep 19 22:20:12 CEST 2011

On 17/09/11 13:08, lists wrote:
> I have been trying to learn how to parse XML with Python and learn how
> to use xml.etree. Lots of the tutorials seem to be very long winded.
>
> I'm trying to access a UK postcode API at www.uk-postcodes.com to take
> a UK postcode and return the lat/lng of the postcode. This is what the
> XML looks like: http://www.uk-postcodes.com/postcode/HU11AA.xml
>
> The function below returns a dict with the xml tag as a key and the
> text as a value. Is this a correct way to use xml.etree?

Define correct, does it give the desired result? Then I would say yes it 
is correct. There may be alternative ways to get to the same result though.

> def ukpostcodesapi(postcode):
> 	import urllib

Why do the import here, for speed? You are reading an xml file from the 
internet, guess where most of the time is spend in your function ;-).

> 	import xml.etree.ElementTree as etree
>
> 	baseURL='http://www.uk-postcodes.com/'
>          geocodeRequest='postcode/'+postcode+'.xml'

You could use string formatting here.
   url = 'http://www.uk-postcodes.com/postcode/%s.xml' % postcode

Also what would happen if postcode includes a space?

>
> 	#grab the xml
> 	tree=etree.parse(urllib.urlopen(baseURL+geocodeRequest))

What happens if you get an error (a 404 error perhaps)? You might want 
to add a try/except block around reading the xml from the internet.

> 	root=tree.getroot()
> 	results={}
> 	for child in root[1]: #here's the geo tag
> 		results.update({child.tag:child.text}) #build a dict containing the
> geocode data
> 	return results

As you only get 1 set of long/lat tags in the xml you could use find(). 
See below an example.

from xml.etree import ElementTree as ET
import urllib2

url = 'http://www.uk-postcodes.com/postcode/HU11AA.xml'
xml = urllib2.urlopen(url).read()
tree = ET.XML(xml)

geo = {}

for leaf in tree.find('geo'):
     geo[leaf.tag] = leaf.text

Greets
Sander