SOT : & in XML-documents

Henrik Motakef henrik.motakef at web.de
Tue Oct 8 21:55:33 CEST 2002


"Thomas Weholt" <2002 at weholt.org> writes:

> I'm trying to parse an old fileformat into xml. The problem is that the
> character & appears from time to time in the original file.
[...]
> Anybody got any clues on how to avoid problems with characters like this?

Don't use them ;-) Or, better, proberly escape them as &amp;. This is
not an issue of the charset, so no XML declaration will save you.

If you are dealing with HTML, you could use tidy (google will find it
for you) to create well-formed XML. IIRC there is also a shareware
program that tries to clean up broken XML regardless of it's document
type, probably called "XML tidy" or some such.

Good luck
Henrik



More information about the Python-list mailing list