[Tutor] Parsing an XML document using ElementTree
Stefan Behnel
stefan_ml at behnel.de
Wed May 25 15:10:06 CEST 2011
Sithembewena Lloyd Dube, 25.05.2011 14:40:
> Thanks for all your suggestions. I read up on gzip and urllib and also
> learned in the process that I could use urllib2 as its the latest form of
> that library.
>
> Herewith my solution: I don't know how elegant it is, but it works just
> fine.
>
> def get_contests():
> url = '
> http://xml.matchbook.com/xmlfeed/feed?sport-id=&vendor=TEST&sport-name=&short-name=Po
> '
> req = urllib2.Request(url)
> req.add_header('accept-encoding','gzip/deflate')
> opener = urllib2.build_opener()
> response = opener.open(req)
This is ok.
> compressed_data = response.read()
> compressed_stream = StringIO.StringIO(compressed_data)
> gzipper = gzip.GzipFile(fileobj=compressed_stream)
> data = gzipper.read()
This should be simplifiable to
uncompressed_stream = gzip.GzipFile(fileobj=response)
> current_path = os.path.realpath(MEDIA_ROOT + '/xml-files/d.xml')
> data_file = open(current_path, 'w')
> data_file.write(data)
> data_file.close()
> xml_data = ET.parse(open(current_path, 'r'))
And this subsequently becomes
xml_data = ET.parse(uncompressed_stream)
> contest_list = []
> for contest_parent_node in xml_data.getiterator('contest'):
Take a look at ET.iterparse().
> contest = Contest()
> for contest_child_node in contest_parent_node:
> if (contest_child_node.tag == "name" and
> contest_child_node.text is not None and contest_child_node.text != ""):
> contest.name = contest_child_node.text
> if (contest_child_node.tag == "league" and
> contest_child_node.text is not None and contest_child_node.text != ""):
> contest.league = contest_child_node.text
> if (contest_child_node.tag == "acro" and
> contest_child_node.text is not None and contest_child_node.text != ""):
> contest.acro = contest_child_node.text
> if (contest_child_node.tag == "time" and
> contest_child_node.text is not None and contest_child_node.text != ""):
> contest.time = contest_child_node.text
> if (contest_child_node.tag == "home" and
> contest_child_node.text is not None and contest_child_node.text != ""):
> contest.home = contest_child_node.text
> if (contest_child_node.tag == "away" and
> contest_child_node.text is not None and contest_child_node.text != ""):
> contest.away = contest_child_node.text
This is screaming for a simplification, such as
for child in contest_parent_node:
if child.tag in ('name', 'league', ...): # etc.
if child.text:
setattr(context, child.tag, child.text)
Stefan
More information about the Tutor
mailing list