Hello,<br><br><div class="gmail_quote">2010/2/28 Stefan Behnel <span dir="ltr"><<a href="mailto:stefan_ml@behnel.de">stefan_ml@behnel.de</a>></span><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
Hal Styli, 27.02.2010 21:50:<br>
<div class="im">> I have a sed solution to the problems below but would like to rewrite<br>
> in python...<br>
<br>
</div>Note that sed (or any other line based or text based tool) is not a<br>
sensible way to handle XML. If you want to read XML, use an XML parser.<br>
They are designed to do exactly what you want in a standard compliant way,<br>
and they can deal with all sorts of XML formatting and encoding, for example.<br>
<div class="im"><br>
<br>
> I need to strip out some data from a quirky xml file into a csv:<br>
><br>
> from something like this<br>
><br>
> < ..... cust="dick" .... product="eggs" ... quantity="12" .... ><br>
> < .... cust="tom" .... product="milk" ... quantity="2" ...><br>
> < .... cust="harry" .... product="bread" ... quantity="1" ...><br>
> < .... cust="tom" .... product="eggs" ... quantity="6" ...><br>
> < ..... cust="dick" .... product="eggs" ... quantity="6" .... ><br>
<br>
</div>As others have noted, this doesn't tell much about your XML. A more<br>
complete example would be helpful.<br>
<div class="im"><br>
<br>
> to this<br>
><br>
> dick,eggs,12<br>
> tom,milk,2<br>
> harry,bread,1<br>
> tom,eggs,6<br>
> dick,eggs,6<br>
><br>
> I am new to python and xml and it would be great to see some slick<br>
> ways of achieving the above by using python's XML capabilities to<br>
> parse the original file or python's regex to achive what I did using<br>
> sed.<br>
<br></div></blockquote><div><br>another solution in this case could be to use an XSLT stylesheet. That way the input processing is defined in an XSLT stylesheet. <br><br>The stylesheet is test.xsl and the insput data test.xml. The following Python code the applies the stylesheet on the input data and puts the output into foo.<br>
<br><div style="margin-left: 40px;">Python code:<br><span style="font-family: courier new,monospace;">#!/usr/bin/python</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">import sys</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">import libxml2</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">import libxslt</span><br style="font-family: courier new,monospace;">
<br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">styledoc = libxml2.parseFile("test.xsl")</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">style = libxslt.parseStylesheetDoc(styledoc)</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">doc = libxml2.parseFile("test.xml")</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">result = style.applyStylesheet(doc, None)</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">style.saveResultToFilename("foo", result, 0)</span><br style="font-family: courier new,monospace;"></div><br>BR,<br>Roland<br><br><b>Example run in Linux:</b><br>
roland@komputer:~/Desktop/XML/XSLT$ ./xslt_test.py <br>
roland@komputer:~/Desktop/XML/XSLT$ cat foo <br>
john,eggs,12<br>
cindy,bread,1<br>
larry,tea bags,100<br>
john,butter,1<br>
derek,chicken,2<br>
derek,milk,2<br>
<b><br>The test.xsl stylesheet:</b><br><span style="font-family: courier new,monospace;"><?xml version="1.0" encoding="UTF-8"?></span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;"><xsl:stylesheet </span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;"> xmlns:xsl="<a href="http://www.w3.org/1999/XSL/Transform">http://www.w3.org/1999/XSL/Transform</a>" </span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;"> xmlns:fo="<a href="http://www.w3.org/1999/XSL/Format">http://www.w3.org/1999/XSL/Format</a>" </span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;"> version="1.0"></span><br style="font-family: courier new,monospace;"><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;"><!-- text output because we want to have an CSV file --></span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;"><xsl:output method="text"/></span><br style="font-family: courier new,monospace;"><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;"><!-- remove all whitespace coming with input XML --></span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;"><xsl:strip-space elements="*"/></span><br style="font-family: courier new,monospace;"><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;"><!-- matches any <order> element and extracts the customer,product&quantity attributes --></span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;"><xsl:template match="order"></span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;"> <xsl:value-of select="@customer"/></span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;"> <xsl:text>,</xsl:text></span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;"> <xsl:value-of select="@product"/></span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;"> <xsl:text>,</xsl:text></span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;"> <xsl:value-of select="@quantity"/></span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;"> <xsl:text></span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;"></xsl:text></span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;"></xsl:template></span><br style="font-family: courier new,monospace;"><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;"></xsl:stylesheet></span><br style="font-family: courier new,monospace;">
<br style="font-family: courier new,monospace;"><br></div></div>