[Tutor] Parsing an XML document using ElementTree

Wed May 25 14:40:56 CEST 2011

Hi Everyone,

Thanks for all your suggestions. I read up on gzip and urllib and also
learned in the process that I could use urllib2 as its the latest form of
that library.

Herewith my solution: I don't know how elegant it is, but it works just
fine.

def get_contests():
     url = '
http://xml.matchbook.com/xmlfeed/feed?sport-id=&vendor=TEST&sport-name=&short-name=Po
'
     req = urllib2.Request(url)
     req.add_header('accept-encoding','gzip/deflate')
     opener = urllib2.build_opener()
     response = opener.open(req)
     compressed_data = response.read()
     compressed_stream = StringIO.StringIO(compressed_data)
     gzipper = gzip.GzipFile(fileobj=compressed_stream)
     data = gzipper.read()
     current_path = os.path.realpath(MEDIA_ROOT + '/xml-files/d.xml')
     data_file = open(current_path, 'w')
     data_file.write(data)
     data_file.close()
     xml_data = ET.parse(open(current_path, 'r'))
     contest_list = []
     for contest_parent_node in xml_data.getiterator('contest'):
          contest = Contest()
          for contest_child_node in contest_parent_node:
               if (contest_child_node.tag == "name" and
contest_child_node.text is not None and contest_child_node.text != ""):
                    contest.name = contest_child_node.text
               if (contest_child_node.tag == "league" and
contest_child_node.text is not None and contest_child_node.text != ""):
                   contest.league = contest_child_node.text
               if (contest_child_node.tag == "acro" and
contest_child_node.text is not None and contest_child_node.text != ""):
                   contest.acro = contest_child_node.text
               if (contest_child_node.tag == "time" and
contest_child_node.text is not None and contest_child_node.text != ""):
                   contest.time = contest_child_node.text
               if (contest_child_node.tag == "home" and
contest_child_node.text is not None and contest_child_node.text != ""):
                   contest.home = contest_child_node.text
               if (contest_child_node.tag == "away" and
contest_child_node.text is not None and contest_child_node.text != ""):
                   contest.away = contest_child_node.text
          contest_list.append(contest)
     try:
          os.remove(current_path)
     except:
          pass
     return contest_list

Many thanks!

On Tue, May 24, 2011 at 12:35 PM, Stefan Behnel <stefan_ml at behnel.de> wrote:

> Sithembewena Lloyd Dube, 24.05.2011 11:59:
>
>  I am trying to parse an XML feed and display the text of each child node
>> without any success. My code in the python shell is as follows:
>>
>> >>> import urllib
>> >>> from xml.etree import ElementTree as ET
>>
>> >>> content = urllib.urlopen('
>>
>> http://xml.matchbook.com/xmlfeed/feed?sport-id=&vendor=TEST&sport-name=&short-name=Po
>> ')
>> >>> xml_content = ET.parse(content)
>>
>> I then check the xml_content object as follows:
>>
>> >>> xml_content
>> <xml.etree.ElementTree.ElementTree instance at 0x01DC14B8>
>>
>
> Well, yes, it does return an XML document, but not what you expect:
>
>  >>> urllib.urlopen('URL see above').read()
>  "<response>\r\n  <error-message>you must add 'accept-encoding' as
>  'gzip,deflate' to the header of your request</error-message>\r
>  \n</response>"
>
> Meaning, the server forces you to pass an HTTP header to the request in
> order to receive gzip compressed data. Once you have that, you must
> decompress it before passing it into ElementTree's parser. See the
> documentation on the gzip and urllib modules in the standard library.
>
> Stefan
>
>
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>

-- 
Regards,
Sithembewena Lloyd Dube
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20110525/adc2c982/attachment.html>