stripping fields from xml file into a csv

Stefan Behnel stefan_ml at
Mon Mar 1 08:46:04 CET 2010

Hal Styli, 01.03.2010 00:15:
> Stefan, I was happy to see such concise code.
> Your python worked with only very minor modifications.
> Hai's test xml data *without* the first and last line is close enough
> to the data I am using:
> <order customer="john" product="eggs" quantity="12" />
> <order customer="cindy" product="bread" quantity="1" />
> <order customer="larry" product="tea bags" quantity="100" />
> <order customer="john" product="butter" quantity="1" />
> <order product="chicken" quantity="2" customer="derek" />
> ... quirky.
> I  get a large file given to me in this format. I believe it is
> created by something like:
> grep 'customer=' *.xml, where there are a large number of  xml files.

Try to get this fixed at the source. Exporting non-XML that looks like XML
is not a good idea in general, and it means that everyone who wants to read
the data has to adapt, instead of fixing the source once and for all.

> I had to edit the data to include the first and last lines, <orders>
> and </orders>,
> to get the python code to work. It's not an arduous task(!), but can
> you recommend a way to get it to work without
> manually editing the data?

Iff this cannot be fixed at the source, you can write a file-like wrapper
around a file that simply returns the boundary tags before and after
reading from the file itself. All you need is a .read(n) method, see the
documentation of the file type.

> One other thing, what's the Roland Mueller post above about (I'm
> viewing htis in google groups)? What would the test.xsl file look
> like?

This is the XSLT script he posted:

<?xml version="1.0" encoding="UTF-8"?>

<!-- text output because we want to have an CSV file -->
<xsl:output method="text"/>

<!-- remove all whitespace coming with input XML -->
<xsl:strip-space elements="*"/>

<!-- matches any <order> element and extracts the customer,product&quantity
attributes -->
<xsl:template match="order">
  <xsl:value-of select="@customer"/>
  <xsl:value-of select="@product"/>
  <xsl:value-of select="@quantity"/>



More information about the Python-list mailing list