
In our EarlyPrint project (https://texts.earlyprint.org) my colleague has written a simple script that does two things: 1. It indents elements one space per hierarchical level 2. It arranges attributes in alphabetical order The second may be an example of what Wallace Stevens calls 'the blessed rage for order', but it has its uses since the texts go through iterative and sometimes manual editing in oXygen, and if you always encounter @lemma, @pos, and @reg in a fixed order, it's a little easier not to make mistakes. I attach the script below. It's a shell script, uses XSLT, and might be as well Chinese to me. If it were possible to integrate the script into Python or replace it with a Python function, it would make the ping-pong of exchanging iterative versions of files more orderly. From the command line you run the script as ~/format.sh A12345.xml, where A12345.xml is a filename. I understand that Python has a procedure for integrating other scripts by wrapping them in subprocess.run(). But in an lxml script the object represented by A12345 is either a tree or a serialized version in memory, and I don't know how to turn that into the argument of format.sh. Below is the script. I'll be very grateful for advice. #!/bin/sh # Sort attributes. tmp="$(mktemp)" xsltproc - "$1" > "$tmp" << STYLESHEET || { echo 'xsltproc: attribute sort failed for:' "$1" >&2; exit 1; } <?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="xml" encoding="UTF-8"/> <xsl:template match="*"> <xsl:copy> <xsl:apply-templates select="@*"> <xsl:sort select="name()"/> </xsl:apply-templates> <xsl:apply-templates/> </xsl:copy> </xsl:template> <xsl:template match="@*|comment()|processing-instruction()"> <xsl:copy /> </xsl:template> </xsl:stylesheet> STYLESHEET mv "$tmp" "$1" # Format with single space as the indent. tmp="$(mktemp)" XMLLINT_INDENT=$' ' xmllint --encode UTF-8 --format "$1" --output "$tmp" 2>/dev/null || { echo 'xmllint: format failed for:' "$1" >&2 exit 1 } mv "$tmp" "$1" Martin Mueller Professor emeritus of English and Classics Northwestern University

Hello Martin, a) you can apply XSLT directly using lxml without using the external xsltproc program: To do so, load the stylesheet as XML into an XSLT object: from lxml import etree stylesheet_document = """ <?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform”> … </xsl:stylesheet> """ stylesheet = etree.XSLT(etree.XML(stylesheet_document)) This stylesheet can then be used to apply the XSL transformation to some given XML data: result = stylesheet(data) b) Indenting A more direct approach to indent your xml output would be to write your own function to turn XML back into a string. Also, you could sort the attributes in this step. c) calling external programs passing data per stdin and receiving data from stdout: The called program must be configured to expect data from stdin, for example xmllint --format - will receive XML from stdin and write the formatted result to stdout.
p =Popen(['xmllint','--format','-'], stdout=PIPE, stdin=PIPE, stderr=STDOUT) result = p.communicate(input=xdata)
jens

Hello Martin, a) you can apply XSLT directly using lxml without using the external xsltproc program: To do so, load the stylesheet as XML into an XSLT object: from lxml import etree stylesheet_document = """ <?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform”> … </xsl:stylesheet> """ stylesheet = etree.XSLT(etree.XML(stylesheet_document)) This stylesheet can then be used to apply the XSL transformation to some given XML data: result = stylesheet(data) b) Indenting A more direct approach to indent your xml output would be to write your own function to turn XML back into a string. Also, you could sort the attributes in this step. c) calling external programs passing data per stdin and receiving data from stdout: The called program must be configured to expect data from stdin, for example xmllint --format - will receive XML from stdin and write the formatted result to stdout.
p =Popen(['xmllint','--format','-'], stdout=PIPE, stdin=PIPE, stderr=STDOUT) result = p.communicate(input=xdata)
jens
participants (2)
-
Jens Quade
-
Martin Mueller