[ python-Bugs-946130 ] xmlrpclib omits charset in Content-Type HTTP header

SourceForge.net noreply at sourceforge.net
Sun May 2 12:28:57 EDT 2004


Bugs item #946130, was opened at 2004-05-02 00:30
Message generated for change (Comment added) made by loewis
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=946130&group_id=5470

Category: None
Group: Not a Bug
Status: Closed
Resolution: Invalid
Priority: 5
Submitted By: Christian Schmidt (c960657)
Assigned to: Nobody/Anonymous (nobody)
Summary: xmlrpclib omits charset in Content-Type HTTP header

Initial Comment:
When xmlrpclib makes an HTTP request, it always sends
the HTTP header line "Content-Type: text/xml". The
encoding of the XML document is specified in the <?xml
...?> tag, e.g. <?xml version='1.0' encoding='utf-8'?>.

However, when XML is transferred over HTTP, the charset
specified in the HTTP Content-Type header takes
precedence over that in the document itself, i.e. the
encoding specified in th <?xml?> tag should be ignored
(RFC 3023 section 3.1). If the charset is not specified
in the Content-Type header, it defaults to us-ascii.

xmlrpclib currently specifies the charset in the
encoding attribute of the <?xml?> tag and not in the
HTTP header. The XML-RPC server thus treats the XML
document as us-ascii instead of the specified encoding.

xmlrpclib should specify the encoding in the
Content-Type header.

Disclaimer: I am no expert in XML and MIME-types, so I
might be wrong about this.

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2004-05-02 18:28

Message:
Logged In: YES 
user_id=21627

When it comes to XML-RPC, the only person to argue with is
Dave Winer (arguing with us is futile). If you can make him
say, in public, that adding charset= is ok for XML-RPC
implementations, we can change Python.

As for your current problem: It would be best to use
US-ASCII for encoding your XML document, representing
non-ASCII characters as character references. Of course,
that is currently not supported in xmlrpclib; patches welcome.

----------------------------------------------------------------------

Comment By: Christian Schmidt (c960657)
Date: 2004-05-02 16:39

Message:
Logged In: YES 
user_id=32013

Hmm, interesting.

I agree that according to the letter of the spec, the
encoding cannot be specified in the Content-Type header.

But -- XML-RPC uses HTTP so I would argue that RFC 3023
still applies. If it does, a server should ignore any
encoding specified in the <?xml>, and the default encoding
for text/xml is us-ascii. So in order to represent
non-us-ascii characters in an XML-RPC message, they should
be encoded using the &#xx; notation.

xmlrpclib doesn't do this, so I suggest reopening this bug.

FYI: The actual problem I am having is making xmlrpclib work
with the XML_RPC_Server that is part of PEAR (PEAR is the
"official" PHP Extension and Application Repository). This
server does not inspect the <?xml> tag for an encoding but
always assumes that the input is UTF-8. According to RFC
3023 it should assume it to be us-ascii, but since us-ascii
is a subset of UTF-8, the current behaviour of the server
should be safe, as long as the client either sends us-ascii
or UTF-8 (I have submitted a patch to the XML_RPC_Server
maintainer that extends the encoding detection to look in
both the Content-Type header and the <?xml> tag).

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2004-05-02 14:25

Message:
Logged In: YES 
user_id=21627

The XML-RPC spec is very clear that the value of the
Content-Type header is "text/xml". Following the traditional
interpretation of the XML-RPC spec (where examples are
considered normative), it would be a protocol violation to
add a charset= parameter to Content-Type.

Until the XML-RPC spec is changed, or the status of using
charset= in XML-RPC is officially clarified, we can't change
our implementation.

Closing this as not-a-bug.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2004-05-02 12:59

Message:
Logged In: YES 
user_id=38388

I don't see anything wrong with the way xmlrpclib deals
with the encoding.

You right on one point: HTTP defaults to Latin-1 as charset,
but since the content may well be non-Latin-1, xmlrpclib
should probably also place the encoding information into the
HTTP header (for requests it sends out).

However, this is rarely a problem, since clients usually don't
follow the HTTP way of interpreting the charset when seeing
text/xml as content type... xmlrpclib itself certainly
doesn't :-)


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=946130&group_id=5470



More information about the Python-bugs-list mailing list