[XML-SIG] Learning to use elementtree

Doran, Harold HDoran at air.org
Tue Apr 8 16:14:17 CEST 2008


Well, maybe this is what I should have done to start with to avoid the
name collusion problem

from xml.etree.ElementTree import ElementTree as ET
f = open('test.txt', 'w')
et = ET(file='out_g4r_b.xml')
for statentityref in
et.findall('admin/responseanalyses/analysis/analysisdata/statentityref')
:
   for ss in statentityref.findall('statentityref'):
      for statval in ss.findall('statval'): 
         print >> f, statentityref.attrib['id'], ss.attrib['id'], '\t',
statval.attrib['type'], '\t', statval.attrib['value']
f.close() 

This works and formats output as desired. Just checking to see if this
is the way others would tackle this.


> -----Original Message-----
> From: xml-sig-bounces at python.org 
> [mailto:xml-sig-bounces at python.org] On Behalf Of Doran, Harold
> Sent: Tuesday, April 08, 2008 9:48 AM
> To: Stefan Behnel
> Cc: xml-sig at python.org; J. Cliff Dyer
> Subject: Re: [XML-SIG] Learning to use elementtree
> 
> Thanks. I'm piecing this together slowly, but I did get the 
> following to work.
> 
> Test.py
> from xml.etree.ElementTree import ElementTree as ET f = 
> open('test.txt', 'w') et = ET(file='out_g4r_b.xml') for 
> statentityref in
> et.findall('admin/responseanalyses/analysis/analysisdata/state
> ntityref')
> :
>    print >> f, statentityref.attrib['id']    
>    for statentityref in statentityref.findall('statentityref'):
>       for statval in statentityref.findall('statval'): 
>          print >> f, statentityref.attrib['id'], '\t', 
> statval.attrib['type'], '\t', statval.attrib['value']
> f.close() 
> 
> And this gives output like:
> 
> 13963
> 0.000000 	UncollapsedMeanScore 	23.863636
> 0.000000 	ScorePtPct 	0.018333
> 0.000000 	ScorePtBiserial 	-0.496309
> 0.000000 	ScorePtAdjBiserial 	-0.452588
> 1.000000 	UncollapsedMeanScore 	34.941426
> 1.000000 	ScorePtPct 	0.981667
> 1.000000 	ScorePtBiserial 	0.496309
> 1.000000 	ScorePtAdjBiserial 	0.452588
> omit 	ScorePtPct 	0.000000
> omit 	ScorePtBiserial 	-99999.990000
> omit 	ScorePtAdjBiserial 	-99999.990000
> 13962
> 0.000000 	UncollapsedMeanScore 	29.305195
> 0.000000 	ScorePtPct 	0.256667
> 0.000000 	ScorePtBiserial 	-0.484469
> 0.000000 	ScorePtAdjBiserial 	-0.425165
> 1.000000 	UncollapsedMeanScore 	36.614350
> 1.000000 	ScorePtPct 	0.743333
> 1.000000 	ScorePtBiserial 	0.484469
> 1.000000 	ScorePtAdjBiserial 	0.425165
> omit 	ScorePtPct 	0.000000
> omit 	ScorePtBiserial 	-99999.990000
> omit 	ScorePtAdjBiserial 	-99999.990000
> 
> ...
> 
> This is almost exactly what I want, and can live with this if needed.
> What would be most convenient, however, is to format the ouput as
> follows:
> 
> 13963	0.000000 	UncollapsedMeanScore 	23.863636
> 13963	0.000000 	ScorePtPct 	0.018333
> 13963	0.000000 	ScorePtBiserial 	-0.496309
> 13963	0.000000 	ScorePtAdjBiserial 	-0.452588
> 13963	1.000000 	UncollapsedMeanScore 	34.941426
> 13963	1.000000 	ScorePtPct 	0.981667
> 13963	1.000000 	ScorePtBiserial 	0.496309
> 13963	1.000000 	ScorePtAdjBiserial 	0.452588
> 
> I think this may be what Cliff meant by name collusion. That 
> is, the number 13963 comes from an attribute ['id'] in 
> statentityref. But also, 0.000 and 1.0 are also from the id 
> attribute in statentityref nested in statentityref. So, I'm a 
> bit confused as to how to go about printing them out side by side.
> 
> 
> > -----Original Message-----
> > From: Stefan Behnel [mailto:stefan_ml at behnel.de]
> > Sent: Monday, April 07, 2008 8:32 AM
> > To: Doran, Harold
> > Cc: J. Cliff Dyer; xml-sig at python.org
> > Subject: Re: [XML-SIG] Learning to use elementtree
> > 
> > Hi,
> > 
> > Doran, Harold wrote:
> > > Well, I think I'm getting close. But, I think this is
> > similar to the
> > > problem I had when I started. This seems to create a huge 
> data file 
> > > with all information under the first item, and then again all 
> > > information under the second item and so forth.
> > > 
> > > for statentityref in \
> > > 
> > et.findall('admin/responseanalyses/analysis/analysisdata/state
> > ntityref')
> > > :   
> > >    print >> f, statentityref.attrib['id']
> > >    for statentityref in \
> > >  
> > > 
> > et.findall('admin/responseanalyses/analysis/analysisdata/state
> > ntityref/s
> > > tatentityref'):   
> > >       for statval in statentityref.findall('statval'):
> > >          print >> f, statentityref.attrib['id'], '\t', 
> > > statval.attrib['type'], '\t', statval.attrib['value']
> > 
> > I think you should read the previous post again. You are 
> nesting three 
> > loops here where two would do what you want.
> > 
> > Stefan
> > 
> > 
> > >> -----Original Message-----
> > >> From: J. Cliff Dyer [mailto:jcd at unc.edu]
> > >> Sent: Wednesday, April 02, 2008 3:36 PM
> > >> To: Doran, Harold
> > >> Cc: xml-sig at python.org
> > >> Subject: Re: [XML-SIG] Learning to use elementtree
> > >>
> > >> On Wed, 2008-04-02 at 15:28 -0400, Doran, Harold wrote:
> > >>> Indeed, navigating the xml is tough (for me). I have been
> > >> able to get
> > >>> the following to work. I put in "Sub Element" to 
> indicate the new 
> > >>> section of data. But, from looking at the text output,
> > one doesn't
> > >>> know which item these sub elements belong to. I think the
> > >> solution is
> > >>> to create an index like 13965-0 to show that this is the 
> > >>> subinformation from the item above it. That seems to be
> > >> where I am getting stuck.
> > >>> Although, I am open to other suggestions on how to best
> > >> represent the
> > >>> output.
> > >>>
> > >>> from xml.etree.ElementTree import ElementTree as ET
> > >>>
> > >>> filename = raw_input("Please enter the AM XML file: ") 
> new_file = 
> > >>> raw_input("Save this file as: ")
> > >>>
> > >>> # create a new file defined by the user f = open(new_file, 'w')
> > >>>
> > >>> et = ET(file=filename)
> > >>>
> > >>> for statentityref in \
> > >>>
> > >> 
> > 
> et.findall('admin/responseanalyses/analysis/analysisdata/statentityre
> > >> f
> > >>> ')
> > >>> :
> > >>>     for statval in statentityref.findall('statval'):
> > >>>       print >> f, statentityref.attrib['id'], '\t', 
> > >>> statval.attrib['type'], '\t', statval.attrib['value']
> > >>>
> > >>> f.write("\n\n")
> > >>> f.write("Sub Element\n\n")
> > >>>
> > >>> for statentityref in \
> > >>>
> > >> 
> > 
> et.findall('admin/responseanalyses/analysis/analysisdata/statentityre
> > >> f
> > >>> /s
> > >>> tatentityref'):
> > >>>     for statval in statentityref.findall('statval'):
> > >>>       print >> f, statentityref.attrib['id'], '\t', 
> > >>> statval.attrib['type'], '\t', statval.attrib['value']
> > >>> f.close()
> > >> Do you want your second statentityref loop to be based on
> > its parent
> > >> statentityref?  If so, you need to nest it in the original
> > loop, and
> > >> use an xpath relative to your outer statentityref (and
> > watch for name
> > >> collisions).
> > 
> > 
> _______________________________________________
> XML-SIG maillist  -  XML-SIG at python.org
> http://mail.python.org/mailman/listinfo/xml-sig
> 


More information about the XML-SIG mailing list