error when parsing xml

Richard Brodie R.Brodie at rl.ac.uk
Mon Sep 5 15:56:50 CEST 2005


> > I have found that some people refuse to stick to standards, so whenever I
> > parse XML files I remove any characters that fall in the range
> > <= 0x1f
> >
> >>= 0xf0
>
> Now of what help shall that be? Get rid of all accented characters?
> Sorry, but that surely is the dumbest thing to do here - and has
> _nothing_ to do with standards!

Earlier versions of the Microsoft XML parser accept invalid characters
(e.g. most of those < 0x1f). Sadly, you do find files in the wild that need
to have these stripped before feeding them to a conforming parser.
One can be too enthusiastic about the process, though.





More information about the Python-list mailing list