On Mon, 22 Oct 2007 13:31:31 +0200, Daniel de la Cuesta <daniel.cuesta@iavante.es> wrote:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type"> <title></title> </head> <body bgcolor="#ffffff" text="#000000"> Hi,<br> <br> We are working on a middleware with Twisted that connects to a POP3 server and provides an XMLRPC interface (using twisted.web)) to interacts with third party clients.<br> <br> The twisted middleware has to deal with non ASCII (UTF-8 or Latin-1) encodings in the subject and the body of the mail messages. I have seen that Twisted XMLRPC server doesn't specify the encoding of the XMLRPC response, for example:<br> <br> <pre><?xml version='1.0'?></pre> <pre><methodCall></pre> <pre><methodName>connect</methodName></pre> <pre><params></pre> <pre>...</pre> <pre></params></pre> <pre></methodCall></pre> <br> This issue produces an error when "expat" parses the response in the xmlrpc client:<br> <br> <pre>xml.parsers.expat.ExpatError: not well-formed (invalid token)
</pre> There is an open ticket with a patch to deal with "Latin-1" encodings at the xmlrpc server:<br> <br> <a href="http://twistedmatrix.com/trac/ticket/1909">http://twistedmatrix.com/trac/ticket/1909</a><br> <br> But it is not closed and it has been opened one year ago. <br> <br> How can I solve the encoding problem in the XMLRPC server response? <br>
Help resolve #1909. The patch attached to the ticket is at least missing unit tests for the new functionality it provides. If you can provide this, then we might be able to add the feature and resolve the ticket Jean-Paul
On Mon, 22 Oct 2007 07:55:49 -0400, Jean-Paul Calderone <exarkun@divmod.com> wrote:
[snip]
How can I solve the encoding problem in the XMLRPC server response?
Help resolve #1909. The patch attached to the ticket is at least missing unit tests for the new functionality it provides. If you can provide this, then we might be able to add the feature and resolve the ticket
Actually, this is wrong, please disregard it. :) #1909 is misguided and the attached patch is incorrect. Specifying the encoding parameter to xmlrpclib.dumps() doesn't do anything to solve the problem here. You either need to specify the encoding in the content-type header of the response or you need to use UTF-8 (the default encoding for XML). Of these, you can already do the latter without changing anything in Twisted, since xmlrpclib will emit UTF-8 if you pass it unicode instead of already encoded strings. So the solution to your problem is to return unicode in your result, instead of already encoded strings. These will be encoded to UTF-8, which will be decoded properly by the client. Compare:
from xml.dom.minidom import parseString from xmlrpclib import dumps parseString(dumps((u'fòò',))) <xml.dom.minidom.Document instance at 0xb7c0080c>
and
parseString(dumps((u'fòò'.encode('latin-1'),), encoding='latin-1')) Traceback (most recent call last): File "<stdin>", line 1, in ? File "/home/exarkun/.local/lib/python2.4/site-packages/_xmlplus/dom/minidom.py", line 1925, in parseString return expatbuilder.parseString(string) File "/home/exarkun/.local/lib/python2.4/site-packages/_xmlplus/dom/expatbuilder.py", line 942, in parseString return builder.parseString(string) File "/home/exarkun/.local/lib/python2.4/site-packages/_xmlplus/dom/expatbuilder.py", line 223, in parseString parser.Parse(string, True) xml.parsers.expat.ExpatError: not well-formed (invalid token): line 3, column 16
Jean-Paul
participants (2)
-
Daniel de la Cuesta
-
Jean-Paul Calderone