
I've uploaded a new package to the new PyPI. Editing this new packages gives me a unicode error. The URL is http://www.python.org/pypi?:action=submit_form&name=ll-ansistyle&version=0.6.1 The error I get is the following: --- Error... There's been a problem with your request exceptions.UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 92: ordinal not in range(128) ---- I've used the distutils from current CVS and have author=u"Walter Dörwald" in my setup.py Bye, Walter Dörwald

Zitat von Walter Dörwald <walter@livinglogic.de>:
I see that the package is online now, so I assume that it now worked?
This isn't supposed to work yet. Are you using the register command on this? Can you tell where it decides to encode as Latin-1? PyPI will reject anything that is not UTF-8. As for the uploads: you'll have noticed that it put the sdist files into packages/2.5; this is not supposed to happen. If you delete the files, and reupload them with the current CVS, the files should go into /packages/source. Regards, Martin

martin@v.loewis.de wrote:
OK, I've deleted the files and the packages. Running "setup.py register" with author=u"Walter Dörwald" in setup.py gives me: --- running register Using PyPI login from /home/walter/.pypirc Server response (500): Internal Server Error --- Using author=u"Walter Dörwald".encode("utf-8") in setup.py works. I'm not sure if this is the right approach. The encoding I specify in setup.py should be independent of the encoding used between distutils and PyPI to communicate on the wire. I.e. the author (and maintainer) argument should always be unicode. When str is passed, this is treated as any other str in a unicode context, it is decoded using the default encoding. This would fix another problem: It would make it nearly impossible to send a request to PyPI with the wrong encoding, because any encoding problems are sorted out completely on the client side.
OK, I've re-uploaded the packages. BTW, uploading the packages a second time leads to the following problem: --- running upload Submitting dist/ll-ansistyle-0.6.1.tar.bz2 to http://www.python.org/pypi Upload failed (500): There's been a problem with your request Submitting dist/ll-ansistyle-0.6.1.tar.gz to http://www.python.org/pypi Upload failed (500): There's been a problem with your request --- Is there a way to display the HTTP response by PyPI? Editing the package is still broken. The link "edit" on the page http://www.python.org/pypi/ll-ansistyle/0.6.1 gives: --- Error... There's been a problem with your request exceptions.UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 92: ordinal not in range(128) --- Bye, Walter Dörwald

Walter Dörwald wrote:
I'm not sure if this is the right approach.
I think the approach is right, but the implementation is wrong.
"should" is a correct description. It should allow Unicode strings, which it then should encode to UTF-8 during transmission. The matter of fact is that the register command as released in 2.4 (and 2.4.1) doesn't.
distutils should *not* assume that byte strings are in the default encoding. It is fair to assume they are in ASCII; if the administrator has changed the default encoding, then this cannot possibly affect all the setup.py files out there. Also, it is a fact that the deployed versions of the register command just send byte strings in setup.py as-is, without trying to do any kind of recoding. In any case, PyPI now requires that the form submission uses UTF-8, and refuses anything else. So it *is* impossible to send, say, Latin-1; whether the client makes that happen by properly encoding Unicode strings or whether they are in setup.py in the first place does not matter.
Is there a way to display the HTTP response by PyPI?
Yes, please invoke upload with --show-response.
I see. I'll investigate. Martin

Martin v. Löwis sagte:
OK, that's the problem.
They should be the same. If not, the installation is broken (or at least scripts that rely on this break anywhere else).
So can I have one setup.py for both Python 2.4 and Python 2.5 that does the correct thing when creating a Windows installer for Python 2.4 (I've used Unicode strings for that until now) and using the upload command with Python CVS (which seems to require a byte string now)? I'd like to avoid having to use version checks in setup.py.
[...]
Bye, Walter Dörwald

Martin v. Löwis wrote:
OK.
and it indeed requires utf-8 at the moment. This can be fixed, of course,
The register command in 2.4 (and current CVS) simply does a value = str(value) in post_to_server() so the encoded bytes sent depend on the default encoding. Would it be sufficient to change this to value = unicode(value).encode("utf-8") Another solution might be to include the encoding in the Content-type header of the request. IMHO the best solution would be to do both: Always use UTF-8 as the encoding and include this in the Content-type header in the request. PyPI should honor this encoding when it finds it and should fall back to whatever it used before if it doesn't.
but not for already-released versions.
True, but I can live with that as long as I can use the same setup.py for bdist_windist and register under Python 2.4 and upload under Python CVS. Bye, Walter Dörwald

Walter Dörwald wrote:
Indeed. I think this can go into 2.4.2.
Yeah, well :-) Content-type in form upload is a mess, as you certainly know. It should be honored, but commonly isn't. This, in turn, causes browsers to ignore it. PyPI uses the CGI module. It currently decodes anything that doesn't have a filename attribute to UTF-8, causing rejection of anything that doesn't send UTF-8. This could be fixed/extended, but I think that would be best done in the CGI module, for consumption by any application that uses form upload. For example, doing cgi.FieldStorage(..., encoding="UTF-8") should cause a) decoding of every field that has an encoding= in its content type b) decoding of every field that is not a file to UTF-8. It is a file if it I) has a filename, or II) cannot be decoded to the target decoding For backwards compatibility, a) can only be enabled if the CGI application explicitly tells what encoding it expects. I'd like to state "contributions are welcome", although others may think differently. Regards, Martin

Zitat von Walter Dörwald <walter@livinglogic.de>:
I see that the package is online now, so I assume that it now worked?
This isn't supposed to work yet. Are you using the register command on this? Can you tell where it decides to encode as Latin-1? PyPI will reject anything that is not UTF-8. As for the uploads: you'll have noticed that it put the sdist files into packages/2.5; this is not supposed to happen. If you delete the files, and reupload them with the current CVS, the files should go into /packages/source. Regards, Martin

martin@v.loewis.de wrote:
OK, I've deleted the files and the packages. Running "setup.py register" with author=u"Walter Dörwald" in setup.py gives me: --- running register Using PyPI login from /home/walter/.pypirc Server response (500): Internal Server Error --- Using author=u"Walter Dörwald".encode("utf-8") in setup.py works. I'm not sure if this is the right approach. The encoding I specify in setup.py should be independent of the encoding used between distutils and PyPI to communicate on the wire. I.e. the author (and maintainer) argument should always be unicode. When str is passed, this is treated as any other str in a unicode context, it is decoded using the default encoding. This would fix another problem: It would make it nearly impossible to send a request to PyPI with the wrong encoding, because any encoding problems are sorted out completely on the client side.
OK, I've re-uploaded the packages. BTW, uploading the packages a second time leads to the following problem: --- running upload Submitting dist/ll-ansistyle-0.6.1.tar.bz2 to http://www.python.org/pypi Upload failed (500): There's been a problem with your request Submitting dist/ll-ansistyle-0.6.1.tar.gz to http://www.python.org/pypi Upload failed (500): There's been a problem with your request --- Is there a way to display the HTTP response by PyPI? Editing the package is still broken. The link "edit" on the page http://www.python.org/pypi/ll-ansistyle/0.6.1 gives: --- Error... There's been a problem with your request exceptions.UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 92: ordinal not in range(128) --- Bye, Walter Dörwald

Walter Dörwald wrote:
I'm not sure if this is the right approach.
I think the approach is right, but the implementation is wrong.
"should" is a correct description. It should allow Unicode strings, which it then should encode to UTF-8 during transmission. The matter of fact is that the register command as released in 2.4 (and 2.4.1) doesn't.
distutils should *not* assume that byte strings are in the default encoding. It is fair to assume they are in ASCII; if the administrator has changed the default encoding, then this cannot possibly affect all the setup.py files out there. Also, it is a fact that the deployed versions of the register command just send byte strings in setup.py as-is, without trying to do any kind of recoding. In any case, PyPI now requires that the form submission uses UTF-8, and refuses anything else. So it *is* impossible to send, say, Latin-1; whether the client makes that happen by properly encoding Unicode strings or whether they are in setup.py in the first place does not matter.
Is there a way to display the HTTP response by PyPI?
Yes, please invoke upload with --show-response.
I see. I'll investigate. Martin

Martin v. Löwis sagte:
OK, that's the problem.
They should be the same. If not, the installation is broken (or at least scripts that rely on this break anywhere else).
So can I have one setup.py for both Python 2.4 and Python 2.5 that does the correct thing when creating a Windows installer for Python 2.4 (I've used Unicode strings for that until now) and using the upload command with Python CVS (which seems to require a byte string now)? I'd like to avoid having to use version checks in setup.py.
[...]
Bye, Walter Dörwald

Martin v. Löwis wrote:
OK.
and it indeed requires utf-8 at the moment. This can be fixed, of course,
The register command in 2.4 (and current CVS) simply does a value = str(value) in post_to_server() so the encoded bytes sent depend on the default encoding. Would it be sufficient to change this to value = unicode(value).encode("utf-8") Another solution might be to include the encoding in the Content-type header of the request. IMHO the best solution would be to do both: Always use UTF-8 as the encoding and include this in the Content-type header in the request. PyPI should honor this encoding when it finds it and should fall back to whatever it used before if it doesn't.
but not for already-released versions.
True, but I can live with that as long as I can use the same setup.py for bdist_windist and register under Python 2.4 and upload under Python CVS. Bye, Walter Dörwald

Walter Dörwald wrote:
Indeed. I think this can go into 2.4.2.
Yeah, well :-) Content-type in form upload is a mess, as you certainly know. It should be honored, but commonly isn't. This, in turn, causes browsers to ignore it. PyPI uses the CGI module. It currently decodes anything that doesn't have a filename attribute to UTF-8, causing rejection of anything that doesn't send UTF-8. This could be fixed/extended, but I think that would be best done in the CGI module, for consumption by any application that uses form upload. For example, doing cgi.FieldStorage(..., encoding="UTF-8") should cause a) decoding of every field that has an encoding= in its content type b) decoding of every field that is not a file to UTF-8. It is a file if it I) has a filename, or II) cannot be decoded to the target decoding For backwards compatibility, a) can only be enabled if the CGI application explicitly tells what encoding it expects. I'd like to state "contributions are welcome", although others may think differently. Regards, Martin
participants (3)
-
"Martin v. Löwis"
-
martin@v.loewis.de
-
Walter Dörwald