xml parsing escape characters

Jeremy Bowers jerf at jerf.org
Thu Jan 20 19:01:59 EST 2005


On Thu, 20 Jan 2005 21:54:30 +0100, Martin v. Löwis wrote:

> Luis P. Mendes wrote:
>> When I access the url via the Firefox browser and look into the source
>> code, I also get:
>> 
>> <?xml version="1.0" encoding="utf-8"?> <string
>> xmlns="http................"><DataSet> ~  <Order>
>> ~    <Customer>439</Customer> ~  </Order>
>> </DataSet></string>
> 
> Please do try to understand what you are seeing. This is crucial for
> understanding what happens.

>From extremely painful and lengthy personal experience, Luis, I
***extremely*** strongly recommend taking the time to nail this down until
you really, really, really understand what is going on. Until you can
explain it to somebody else coherently, ideally.

Mixing escaping levels like this absolutely, positively *must* be done
correctly, or extremely-painful-to-debug problems will result.

(My painful experience was layering an RPC implementation in plain text on
top of IM messages, where I was dealing with everything from the socket
level up except the XML parser. Ultimately it turned out there was a
problem in the XML parser, it rendered "&amp;" as "&", which is wrong
wrong wrong. But that took a *long* time to find, especially as I had
other bugs in the way.)

Since you're layering XML in XML, test &amp; and &amp;amp; to make
sure they work correctly; those usually show encoding errors. And, given
your current understanding of the issue, do not write your own decoding
function unless you absolutely can't avoid it.



More information about the Python-list mailing list