[Tutor] Parsing an XML document using ElementTree

Sithembewena Lloyd Dube zebra05 at gmail.com
Fri Jun 10 16:59:52 CEST 2011


Hi Stefan,

Thanks for the code review :) Only just noticed this.

On Wed, May 25, 2011 at 3:10 PM, Stefan Behnel <stefan_ml at behnel.de> wrote:

> Sithembewena Lloyd Dube, 25.05.2011 14:40:
>
>  Thanks for all your suggestions. I read up on gzip and urllib and also
>> learned in the process that I could use urllib2 as its the latest form of
>> that library.
>>
>> Herewith my solution: I don't know how elegant it is, but it works just
>> fine.
>>
>> def get_contests():
>>      url = '
>>
>> http://xml.matchbook.com/xmlfeed/feed?sport-id=&vendor=TEST&sport-name=&short-name=Po
>> '
>>      req = urllib2.Request(url)
>>      req.add_header('accept-encoding','gzip/deflate')
>>      opener = urllib2.build_opener()
>>      response = opener.open(req)
>>
>
> This is ok.
>
>
>
>       compressed_data = response.read()
>>      compressed_stream = StringIO.StringIO(compressed_data)
>>      gzipper = gzip.GzipFile(fileobj=compressed_stream)
>>      data = gzipper.read()
>>
>
> This should be simplifiable to
>
>   uncompressed_stream = gzip.GzipFile(fileobj=response)
>
>
>
>       current_path = os.path.realpath(MEDIA_ROOT + '/xml-files/d.xml')
>>      data_file = open(current_path, 'w')
>>      data_file.write(data)
>>      data_file.close()
>>      xml_data = ET.parse(open(current_path, 'r'))
>>
>
> And this subsequently becomes
>
>   xml_data = ET.parse(uncompressed_stream)
>
>
>
>       contest_list = []
>>      for contest_parent_node in xml_data.getiterator('contest'):
>>
>
> Take a look at ET.iterparse().
>
>
>
>            contest = Contest()
>>           for contest_child_node in contest_parent_node:
>>                if (contest_child_node.tag == "name" and
>> contest_child_node.text is not None and contest_child_node.text != ""):
>>                     contest.name = contest_child_node.text
>>                if (contest_child_node.tag == "league" and
>> contest_child_node.text is not None and contest_child_node.text != ""):
>>                    contest.league = contest_child_node.text
>>                if (contest_child_node.tag == "acro" and
>> contest_child_node.text is not None and contest_child_node.text != ""):
>>                    contest.acro = contest_child_node.text
>>                if (contest_child_node.tag == "time" and
>> contest_child_node.text is not None and contest_child_node.text != ""):
>>                    contest.time = contest_child_node.text
>>                if (contest_child_node.tag == "home" and
>> contest_child_node.text is not None and contest_child_node.text != ""):
>>                    contest.home = contest_child_node.text
>>                if (contest_child_node.tag == "away" and
>> contest_child_node.text is not None and contest_child_node.text != ""):
>>                    contest.away = contest_child_node.text
>>
>
> This is screaming for a simplification, such as
>
>   for child in contest_parent_node:
>       if child.tag in ('name', 'league', ...): # etc.
>           if child.text:
>               setattr(context, child.tag, child.text)
>
>
>
> Stefan
>
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>



-- 
Regards,
Sithembewena Lloyd Dube
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20110610/e53ab61c/attachment.html>


More information about the Tutor mailing list