[Tutor] xml parsing from xml
jitendra gupta
jitu.icfai at gmail.com
Wed May 7 22:26:40 CEST 2014
@All thanks,
I cant use etree/SAX because there we cant get complete line , of course we
can get it by tag name but we are not sure about tag also. Only we know
what ever child of <country> we need to put in new file with country name.
Note: File size is around 800MB, for other requirement(Like converting xml
to csv) i used lxml/others. but in my current scenario i dont know what
child tag will be there .
###### INPUT XML #######
<?xml version="1.0"?>
<data>
<country name="Liechtenstein">
<rank updated="yes">2</rank>
<year>2008</year>
<gdppc>141100</gdppc>
<neighbor name="Austria" direction="E"/>
<neighbor name="Switzerland" direction="W"/>
.......
.......
</country>
<country name="Panama">
<rank updated="yes">69</rank>
<year>2011</year>
<gdppc>13600</gdppc>
<neighbor name="Costa Rica" direction="W"/>
<neighbor name="Colombia" direction="E"/>
</country>
</data>
######## outputxml (Liechtenstein.xml) ######
<?xml version="1.0"?>
<data>
<country name="Liechtenstein">
<rank updated="yes">2</rank>
<year>2008</year>
<gdppc>141100</gdppc>
<neighbor name="Austria" direction="E"/>
<neighbor name="Switzerland" direction="W"/>
.......
.......
</country>
</data>
##### #####
<?xml version="1.0"?>
<data>
<country name="Panama">
<rank updated="yes">69</rank>
<year>2011</year>
<gdppc>13600</gdppc>
<neighbor name="Costa Rica" direction="W"/>
<neighbor name="Colombia" direction="E"/>
</country>
</data>
On Thu, May 8, 2014 at 1:19 AM, Stefan Behnel <stefan_ml at behnel.de> wrote:
> Neil D. Cerutti, 07.05.2014 20:04:
> > On 5/7/2014 1:39 PM, Alan Gauld wrote:
> >> On 07/05/14 17:56, Stefan Behnel wrote:
> >>> Alan Gauld, 07.05.2014 18:11:
> >>>> and ElementTree (aka etree). The documenation gives examples of both.
> >>>> sax is easiest and fastest for simple XML in big files ...
> >>>
> >>> I wouldn't say that SAX qualifies as "easiest". Sure, if the task is
> >>> something like "count number of abc tags" or "find tag xyz and get an
> >>> attribute value from it", then SAX is relatively easy and also quite
> >>> fast.
> >>
> >> That's pretty much what I said. simple task, big file. sax is easy.
> >>
> >> For anything else use etree.
> >>
> >>> BTW, ElementTree also has a SAX-like parsing mode, but comes with a
> >>> simpler interface and saner parser configuration defaults.
> >>
> >> My experience was different. Etree is powerful but for simple
> >> tasks I just found sax easier to grok. (And most of my XML parsing
> >> is limited to simple extraction of a field or two.)
> >
> > If I understand this task correctly it seems like a good application for
> > SAX. As a state machine it could have a mere two states, assuming we
> aren't
> > troubled about the parent nodes of Country tags.
>
> Yep, that's the kind of thing I meant. You get started, just trying to get
> out one little field out of the file, then notice that you need another
> one, and eventually end up writing a page full of code where a couple of
> lines would have done the job. Even just safely and correctly getting the
> text content of an element is surprisingly non-trivial in SAX.
>
> It's still unclear what the OP wanted exactly, though. To me, it read more
> like the task was to copy some content over from one XML file to another,
> in which case doing it in ET is just trivial thanks to the tree API, but
> SAX requires you to reconstruct the XML brick by brick here.
>
>
> > In my own personal case, I partly prefer xml.sax simply because it
> ignores
> > namespaces, a nice benefit in my cases. I wish I could make ElementTree
> do
> > that.
>
> The downside of namespace unaware parsing is that you never know what you
> get. It works for some input, but it may also just fail arbitrarily, for
> equally valid input.
>
> One cool thing about ET is that it makes namespace aware processing easy by
> using fully qualified tag names (one string says it all). Most other XML
> tools (including SAX) require some annoying prefix mapping setup that you
> have to carry around in order to tell the processor that you are really
> talking about the thing that it's showing to you.
>
> Stefan
>
> _______________________________________________
> Tutor maillist - Tutor at python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20140508/6f4111cf/attachment-0001.html>
More information about the Tutor
mailing list