[Tutor] Schema change in ElementTree

Stefan Behnel stefan_ml at behnel.de
Mon Aug 16 08:11:54 CEST 2010


Benjamin Serrato, 16.08.2010 05:51:
> Hi guys, thanks for you help in the past.  I had my first occasion to write
> a useful script recently editing the contents of an xml file.  It was pretty
> simple, but I have two problems which I know have to take care of by opening
> in Notepad++ and changing manually.
>
> 1. The file output has LF not CRLF as the newline character.  I'm not sure
> if this matters, but this is on windows and I'm having trouble reimporting
> this file back into the program it belongs to.

Sounds like the program is not really expecting XML. XML is newline 
character agnostic and parsers will normalise it to plain CR.


> I'd like to change how it
> outputs newlines, or use python to automatically fix the .xml file.

You didn't say where the newline characters occur. Do you pretty print the 
file on output?

ElementTree doesn't have a way of formatting (pretty printing) XML files, 
so there can't be that many newline characters in the structure (they may 
be in the occur, though!). There's a pretty printing recipe on the effbot 
site that you can easily adapt to inject the newline characters you need.


> 2.  Before parsing the file 'xs' is the schema prefix, but after writing
> 'ns0' is the schema prefix (e.g.<xs:schema ...>  to<ns0:schema ...>), but
> this doesn't carry over to the attributes so I'm left with (...
> types="xs:int").  That doesn't make any sense.  I've the documentation at
> effbot, but didn't see anything outright I needed to redefine the schema
> before writing to disk.

Yes, that's a known problem with ET - it doesn't keep namespace prefixes 
end-to-end. In case you are using Python 2.7 (or ET 1.3), you can globally 
assign a specific prefix to a namespace URI, though. That way, ET can 
output your 'xs' prefix for the namespace at hand.


> Also, is it possible to define the order of the
> schema attributes?

Not sure what you mean here. If you are talking about the namespace 
declarations, then no, these are XML attributes which are not ordered. The 
only way to get a deterministic output order is canonical (C14N) 
serialisation. That's also supported by ET since 1.3.

Stefan



More information about the Tutor mailing list