stripping fields from xml file into a csv
stefan_ml at behnel.de
Mon Mar 1 08:46:04 CET 2010
Hal Styli, 01.03.2010 00:15:
> Stefan, I was happy to see such concise code.
> Your python worked with only very minor modifications.
> Hai's test xml data *without* the first and last line is close enough
> to the data I am using:
> <order customer="john" product="eggs" quantity="12" />
> <order customer="cindy" product="bread" quantity="1" />
> <order customer="larry" product="tea bags" quantity="100" />
> <order customer="john" product="butter" quantity="1" />
> <order product="chicken" quantity="2" customer="derek" />
> ... quirky.
> I get a large file given to me in this format. I believe it is
> created by something like:
> grep 'customer=' *.xml, where there are a large number of xml files.
Try to get this fixed at the source. Exporting non-XML that looks like XML
is not a good idea in general, and it means that everyone who wants to read
the data has to adapt, instead of fixing the source once and for all.
> I had to edit the data to include the first and last lines, <orders>
> and </orders>,
> to get the python code to work. It's not an arduous task(!), but can
> you recommend a way to get it to work without
> manually editing the data?
Iff this cannot be fixed at the source, you can write a file-like wrapper
around a file that simply returns the boundary tags before and after
reading from the file itself. All you need is a .read(n) method, see the
documentation of the file type.
> One other thing, what's the Roland Mueller post above about (I'm
> viewing htis in google groups)? What would the test.xsl file look
This is the XSLT script he posted:
<?xml version="1.0" encoding="UTF-8"?>
<!-- text output because we want to have an CSV file -->
<!-- remove all whitespace coming with input XML -->
<!-- matches any <order> element and extracts the customer,product&quantity
More information about the Python-list