[Web-SIG] Python 3.0 and WSGI 1.0.

Alan Kennedy alan at xhaus.com
Thu Apr 2 01:15:34 CEST 2009


Hi Bill,

[Bill]
> I think the controlling reference here is RFC 3875.

I think the controlling references are RFC 2616, RFC 2396 and RFC 3987.

RFC 2616, the HTTP 1.1 spec, punts on the question of character
encoding for the request URI.

RFC 2396, the URI spec, says

"""
   It is expected that a systematic treatment of character encoding
   within URI will be developed as a future modification of this
   specification.
"""

RFC 3987 is that spec, for Internationalized Resource Identifiers. It says

"""
An IRI is a sequence of characters from the Universal Character Set
(Unicode/ISO 10646).
"""

and

"""
1.2.  Applicability

   IRIs are designed to be compatible with recommendations for new URI
   schemes [RFC2718].  The compatibility is provided by specifying a
   well-defined and deterministic mapping from the IRI character
   sequence to the functionally equivalent URI character sequence.
   Practical use of IRIs (or IRI references) in place of URIs (or URI
   references) depends on the following conditions being met:
"""

followed by

"""
   c.  The URI corresponding to the IRI in question has to encode
       original characters into octets using UTF-8.  For new URI
       schemes, this is recommended in [RFC2718].  It can apply to a
       whole scheme (e.g., IMAP URLs [RFC2192] and POP URLs [RFC2384],
       or the URN syntax [RFC2141]).  It can apply to a specific part of
       a URI, such as the fragment identifier (e.g., [XPointer]).  It
       can apply to a specific URI or part(s) thereof.  For details,
       please see section 6.4.
"""

I think the question is "are people using IRIs in the wild"? If so,
then we must decide how do we best deal with the problems of
recognising iso-8859-1+rfc2037 versus utf-8, or whatever
server-configured encoding the user has chosen.

Alan.


More information about the Web-SIG mailing list