[Tutor] Schema change in ElementTree
stefan_ml at behnel.de
Mon Aug 16 08:11:54 CEST 2010
Benjamin Serrato, 16.08.2010 05:51:
> Hi guys, thanks for you help in the past. I had my first occasion to write
> a useful script recently editing the contents of an xml file. It was pretty
> simple, but I have two problems which I know have to take care of by opening
> in Notepad++ and changing manually.
> 1. The file output has LF not CRLF as the newline character. I'm not sure
> if this matters, but this is on windows and I'm having trouble reimporting
> this file back into the program it belongs to.
Sounds like the program is not really expecting XML. XML is newline
character agnostic and parsers will normalise it to plain CR.
> I'd like to change how it
> outputs newlines, or use python to automatically fix the .xml file.
You didn't say where the newline characters occur. Do you pretty print the
file on output?
ElementTree doesn't have a way of formatting (pretty printing) XML files,
so there can't be that many newline characters in the structure (they may
be in the occur, though!). There's a pretty printing recipe on the effbot
site that you can easily adapt to inject the newline characters you need.
> 2. Before parsing the file 'xs' is the schema prefix, but after writing
> 'ns0' is the schema prefix (e.g.<xs:schema ...> to<ns0:schema ...>), but
> this doesn't carry over to the attributes so I'm left with (...
> types="xs:int"). That doesn't make any sense. I've the documentation at
> effbot, but didn't see anything outright I needed to redefine the schema
> before writing to disk.
Yes, that's a known problem with ET - it doesn't keep namespace prefixes
end-to-end. In case you are using Python 2.7 (or ET 1.3), you can globally
assign a specific prefix to a namespace URI, though. That way, ET can
output your 'xs' prefix for the namespace at hand.
> Also, is it possible to define the order of the
> schema attributes?
Not sure what you mean here. If you are talking about the namespace
declarations, then no, these are XML attributes which are not ordered. The
only way to get a deterministic output order is canonical (C14N)
serialisation. That's also supported by ET since 1.3.
More information about the Tutor