stripping fields from xml file into a csv
Stefan Behnel
stefan_ml at behnel.de
Mon Mar 1 02:46:04 EST 2010
Hal Styli, 01.03.2010 00:15:
> Stefan, I was happy to see such concise code.
> Your python worked with only very minor modifications.
>
> Hai's test xml data *without* the first and last line is close enough
> to the data I am using:
>
> <order customer="john" product="eggs" quantity="12" />
> <order customer="cindy" product="bread" quantity="1" />
> <order customer="larry" product="tea bags" quantity="100" />
> <order customer="john" product="butter" quantity="1" />
> <order product="chicken" quantity="2" customer="derek" />
>
> ... quirky.
>
> I get a large file given to me in this format. I believe it is
> created by something like:
> grep 'customer=' *.xml, where there are a large number of xml files.
Try to get this fixed at the source. Exporting non-XML that looks like XML
is not a good idea in general, and it means that everyone who wants to read
the data has to adapt, instead of fixing the source once and for all.
> I had to edit the data to include the first and last lines, <orders>
> and </orders>,
> to get the python code to work. It's not an arduous task(!), but can
> you recommend a way to get it to work without
> manually editing the data?
Iff this cannot be fixed at the source, you can write a file-like wrapper
around a file that simply returns the boundary tags before and after
reading from the file itself. All you need is a .read(n) method, see the
documentation of the file type.
> One other thing, what's the Roland Mueller post above about (I'm
> viewing htis in google groups)? What would the test.xsl file look
> like?
This is the XSLT script he posted:
============================
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:fo="http://www.w3.org/1999/XSL/Format"
version="1.0">
<!-- text output because we want to have an CSV file -->
<xsl:output method="text"/>
<!-- remove all whitespace coming with input XML -->
<xsl:strip-space elements="*"/>
<!-- matches any <order> element and extracts the customer,product&quantity
attributes -->
<xsl:template match="order">
<xsl:value-of select="@customer"/>
<xsl:text>,</xsl:text>
<xsl:value-of select="@product"/>
<xsl:text>,</xsl:text>
<xsl:value-of select="@quantity"/>
<xsl:text>
</xsl:text>
</xsl:template>
</xsl:stylesheet>
============================
Stefan
More information about the Python-list
mailing list