[Python-Dev] Re: [Patches] Patch for xmlrpc encoding

M.-A. Lemburg mal@lemburg.com
Tue, 10 Dec 2002 12:04:53 +0100


Ragnar Kj=F8rstad wrote:
> On Mon, Dec 09, 2002 at 11:15:38AM +0100, M.-A. Lemburg wrote:
>=20
>>>The dumps-method in xmlrpclib has the following comment:
>>>    All 8-bit strings in the data structure are assumed to use the
>>>    packet encoding.  Unicode strings are automatically converted,
>>>    where necessary.
>>>
>>>This doesn't work very well. In our particular case we're using latin_=
1
>>>as our default encoding, and we're using UTF-8 for the packet encoding.
>>>We can't really change the default encoding, because the sql-modules
>>>transfer latin_1 encoded data and we can't change the packet encoding =
to
>>>latin_1 because the xmlrpc-client (php) doesn't work with that.
>>>
>>>The attached patch changes xmlrpclib to convert strings to unicode usi=
ng
>>>the default encoding, and then convert them back to strings with the
>>>packet encoding. If unicode is not available it falls back to the old
>>>behaviour.
>>
>>I believe this is overkill. If you need this behaviour, subclass
>>the Marshaller in xmlrpclib and add your feature to that subclass.
>>Then replace the Marshaller class in xmlrpclib with your subclass.
>=20
> Well, we replaced the xmlrpclib.Marshaller.dump_string method from our
> application. That works as a workaround for us, but my point was not no=
t
> to merely make our application work but to fix this problem for other
> python users as well.

The first point I want to make is that non-ASCII text doesn't belong in
strings, it belongs in Unicode objects. I don't think there's anything
to argue here.

My second point is that I would like xmlrpclib to be more flexible
w/r to using user-supplied marshallers and unmarshallers. The
reasoning here is that applications will want to send custom
types/classes over the wire and need to provide custom mappings
for these via those adapted marshallers, e.g. a product I'm working
on uses mxDateTime objects to store date/time values and passes
in buffer() objects to signal: this is binary data.

> The library makes an assumption that is (IMHO) just not valid. There is
> simply no reason to assume strings use the packet encoding.=20

It is valid with respect to what the whole standard lib assumes:
strings contains ASCII text data or binary data.

> Why would you not like to fix this? Because of the performance? It woul=
d
> be possible to have both functions available in the class, and only use
> the encoding-convertion when the encodings are actually different. This
> could be done with no other performance penalty than a simple check whe=
n
> the encoding is set. (The constructor?)
>=20
> I simply didn't include this code in my patch because it would make the
> code harder to read and I think most people use ascii or latin_1 for th=
eir=20
> string-encoding and UTF-8 as their packet-encoding.

A correct fix would be to check whether the strings is indeed
using the xmlrpc encoding. That is IMHO overkill, since this is
something to think about when designing the application and
not a runtime check that's always needed.

>>Aside: xmlrpclib should support subclassing the Marshaller and
>>Unmarshaller more transparently. Currently, the two are hard-coded
>>into the rest of xmlrpclib without the possibility to provide your
>>own subclasses without tweaking xmlrpclib from the outside.
>=20
>=20
> In principle I agree, but it should not be neccessary to subclass the
> Marshaller for most applications, and tweaking it from the outside can
> be done pretty easily in python :)

Sure, but it's not a clean solution :-( I've done that too in the past,
but reverted to writing my own code for the dumps() and loads()
APIs to be able to use subclassing to implement the above mentioned
mappings.

>>Please post patches using the SourceForge patch manager.
>=20
>=20
> Didn't you just write that the patch was overkill and you didn't want
> it? Do you want me to post it anyway? Or did you just mean for any
> potentail future patches?

No, I meant that patches should always be posted there. SF is what
we use to track patches and bugs.

--=20
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
_______________________________________________________________________
eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,...
Python Consulting:                               http://www.egenix.com/
Python Software:                    http://www.egenix.com/files/python/