XML_RPC and unicode problems
"Martin v. Löwis"
martin at v.loewis.de
Sun Sep 19 11:45:51 CEST 2004
Ivan Voras wrote:
> Martin v. Löwis wrote:
>> Binary data and XML-RPC
>> has a long and confusing history.
> Why is that? There's <base64> for data that's expected to be binary[*],
> and <string> for everything else that's valid under chosen encoding.
base64 originally wasn't part of the XML-RPC spec; it was added on
1/21/99. Before, the spec simultaneously claimed that the string
element contains ASCII, that "full XML" is allowed, and that the
string element can carry arbitrary binary data.
These were all mutually contradicting: If you were to put arbitrary
bytes into a string element, it would neither be well-formed XML
(atleast not if you choose us-ascii or utf-8 as the encoding), nor
would the strings be pure ASCII.
Also, if the string can only carry ASCII, how can it be
simultaneously allow for arbitrary XML?
People have asked all these questions, and Dave Winer always
said "read the spec, it says it all", when it really didn't.
I believe that Dave's understanding was the following: With
"ASCII", he didn't really mean "American Standard Code for
Information Interchange". He meant that all bytes in the
document must have ordinals < 127. He was fine with people
putting character references (such as Ü) into string
elements. He clarified that aspect on 6/30/03, by removing
"ASCII" from the description of string.
Wrt. binary data, I think he meant that you could use
base64, uuencode, hex, whatever, in a string element, and
thus represent arbitrary bytes. Of course, this would not
be very interoperable, so he added base64.
More information about the Python-list