[XML-SIG] Processing xml files with ISO 8859-1 chars

Lars Marius Garshol larsga@garshol.priv.no
07 Nov 2001 20:25:06 +0100


* Dan Gunter
| 
| Of course, checking an _arbitrary_ encoding for correctness seems
| like a real burden on the parser, but maybe UTF-8 is so common it
| should be checked.

All encodings should be checked for correctness, although not all of
them can be. Most single-byte encodings (like the ISO 8859-x series)
have no illegal bit sequences, and so cannot be checked with anything
short of full-scale AI. Most multi-byte encodings, however, have
illegal bit sequences and converters can and should check these for
correctness. This is really no different from or less important than
verifying syntactical correctness.

What sets UTF-8 apart in this context is that it is the default
encoding for XML documents, so that if you find illegal UTF-8 bit
sequences you can be pretty sure that the user is not using UTF-8, but
has just omitted to declare that fact.

--Lars M.