[Python-Dev] New PyPI broken package editing

"Martin v. Löwis" martin at v.loewis.de
Wed Mar 30 22:37:09 CEST 2005


Walter Dörwald wrote:
> The register command in 2.4 (and current CVS) simply does a
>    value = str(value)
> in post_to_server() so the encoded bytes sent depend on the
> default encoding. Would it be sufficient to change this to
>    value = unicode(value).encode("utf-8")

Indeed. I think this can go into 2.4.2.

> Another solution might be to include the encoding in the Content-type 
> header of the request. IMHO the best solution would be to do both:
> Always use UTF-8 as the encoding and include this in the Content-type
> header in the request. PyPI should honor this encoding when it finds
> it and should fall back to whatever it used before if it doesn't.

Yeah, well :-) Content-type in form upload is a mess, as you certainly
know. It should be honored, but commonly isn't. This, in turn, causes
browsers to ignore it.

PyPI uses the CGI module. It currently decodes anything that doesn't
have a filename attribute to UTF-8, causing rejection of anything
that doesn't send UTF-8. This could be fixed/extended, but I think that
would be best done in the CGI module, for consumption by any application
that uses form upload. For example, doing

cgi.FieldStorage(..., encoding="UTF-8")

should cause

a) decoding of every field that has an encoding= in its content
    type
b) decoding of every field that is not a file to UTF-8. It is a
    file if it
    I) has a filename, or
    II) cannot be decoded to the target decoding

For backwards compatibility, a) can only be enabled if the CGI
application explicitly tells what encoding it expects.

I'd like to state "contributions are welcome", although others
may think differently.

Regards,
Martin


More information about the Python-Dev mailing list