I've uploaded a new package to the new PyPI. Editing this new packages gives me a unicode error. The URL is http://www.python.org/pypi?:action=submit_form&name=ll-ansistyle&version=0.6.1 The error I get is the following: --- Error... There's been a problem with your request exceptions.UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 92: ordinal not in range(128) ---- I've used the distutils from current CVS and have author=u"Walter Dörwald" in my setup.py Bye, Walter Dörwald
Zitat von Walter Dörwald <walter@livinglogic.de>:
I've uploaded a new package to the new PyPI. Editing this new packages gives me a unicode error. The URL is
http://www.python.org/pypi?:action=submit_form&name=ll-ansistyle&version=0.6.1
I see that the package is online now, so I assume that it now worked?
I've used the distutils from current CVS and have author=u"Walter Dörwald" in my setup.py
This isn't supposed to work yet. Are you using the register command on this? Can you tell where it decides to encode as Latin-1? PyPI will reject anything that is not UTF-8. As for the uploads: you'll have noticed that it put the sdist files into packages/2.5; this is not supposed to happen. If you delete the files, and reupload them with the current CVS, the files should go into /packages/source. Regards, Martin
Zitat von Walter Dörwald <walter@livinglogic.de>:
I've uploaded a new package to the new PyPI. Editing this new packages gives me a unicode error. The URL is
http://www.python.org/pypi?:action=submit_form&name=ll-ansistyle&version=0.6.1
I see that the package is online now, so I assume that it now worked?
Uploading worked, but editing the package afterwards fails.
I've used the distutils from current CVS and have author=u"Walter Dörwald" in my setup.py
This isn't supposed to work yet. Are you using the register command on this? Can you tell where it decides to encode as Latin-1? PyPI will reject anything that is not UTF-8.
You might be right. I've tried with author=u"..." and author=u"...".encode("utf-8"). The second version might have been the one that worked.
As for the uploads: you'll have noticed that it put the sdist files into packages/2.5; this is not supposed to happen. If you delete the files, and reupload them with the current CVS, the files should go into /packages/source.
OK, I'll try again tomorrow morning. Bye, Walter Dörwald
martin@v.loewis.de wrote:
Zitat von Walter Dörwald <walter@livinglogic.de>:
I've uploaded a new package to the new PyPI. Editing this new packages gives me a unicode error. The URL is
http://www.python.org/pypi?:action=submit_form&name=ll-ansistyle&version=0.6.1
I see that the package is online now, so I assume that it now worked?
OK, I've deleted the files and the packages. Running "setup.py register" with author=u"Walter Dörwald" in setup.py gives me: --- running register Using PyPI login from /home/walter/.pypirc Server response (500): Internal Server Error --- Using author=u"Walter Dörwald".encode("utf-8") in setup.py works. I'm not sure if this is the right approach. The encoding I specify in setup.py should be independent of the encoding used between distutils and PyPI to communicate on the wire. I.e. the author (and maintainer) argument should always be unicode. When str is passed, this is treated as any other str in a unicode context, it is decoded using the default encoding. This would fix another problem: It would make it nearly impossible to send a request to PyPI with the wrong encoding, because any encoding problems are sorted out completely on the client side.
[...] As for the uploads: you'll have noticed that it put the sdist files into packages/2.5; this is not supposed to happen. If you delete the files, and reupload them with the current CVS, the files should go into /packages/source.
OK, I've re-uploaded the packages. BTW, uploading the packages a second time leads to the following problem: --- running upload Submitting dist/ll-ansistyle-0.6.1.tar.bz2 to http://www.python.org/pypi Upload failed (500): There's been a problem with your request Submitting dist/ll-ansistyle-0.6.1.tar.gz to http://www.python.org/pypi Upload failed (500): There's been a problem with your request --- Is there a way to display the HTTP response by PyPI? Editing the package is still broken. The link "edit" on the page http://www.python.org/pypi/ll-ansistyle/0.6.1 gives: --- Error... There's been a problem with your request exceptions.UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 92: ordinal not in range(128) --- Bye, Walter Dörwald
Walter Dörwald wrote:
I'm not sure if this is the right approach.
I think the approach is right, but the implementation is wrong.
The encoding I specify in setup.py should be independent of the encoding used between distutils and PyPI to communicate on the wire. I.e. the author (and maintainer) argument should always be unicode.
"should" is a correct description. It should allow Unicode strings, which it then should encode to UTF-8 during transmission. The matter of fact is that the register command as released in 2.4 (and 2.4.1) doesn't.
When str is passed, this is treated as any other str in a unicode context, it is decoded using the default encoding. This would fix another problem: It would make it nearly impossible to send a request to PyPI with the wrong encoding, because any encoding problems are sorted out completely on the client side.
distutils should *not* assume that byte strings are in the default encoding. It is fair to assume they are in ASCII; if the administrator has changed the default encoding, then this cannot possibly affect all the setup.py files out there. Also, it is a fact that the deployed versions of the register command just send byte strings in setup.py as-is, without trying to do any kind of recoding. In any case, PyPI now requires that the form submission uses UTF-8, and refuses anything else. So it *is* impossible to send, say, Latin-1; whether the client makes that happen by properly encoding Unicode strings or whether they are in setup.py in the first place does not matter.
Is there a way to display the HTTP response by PyPI?
Yes, please invoke upload with --show-response.
Editing the package is still broken. The link "edit" on the page http://www.python.org/pypi/ll-ansistyle/0.6.1 gives: --- Error...
There's been a problem with your request
exceptions.UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 92: ordinal not in range(128)
I see. I'll investigate. Martin
Martin v. Löwis sagte:
Walter Dörwald wrote:
I'm not sure if this is the right approach.
I think the approach is right, but the implementation is wrong.
The encoding I specify in setup.py should be independent of the encoding used between distutils and PyPI to communicate on the wire. I.e. the author (and maintainer) argument should always be unicode.
"should" is a correct description. It should allow Unicode strings, which it then should encode to UTF-8 during transmission. The matter of fact is that the register command as released in 2.4 (and 2.4.1) doesn't.
OK, that's the problem.
When str is passed, this is treated as any other str in a unicode context, it is decoded using the default encoding. This would fix another problem: It would make it nearly impossible to send a request to PyPI with the wrong encoding, because any encoding problems are sorted out completely on the client side.
distutils should *not* assume that byte strings are in the default encoding. It is fair to assume they are in ASCII;
They should be the same. If not, the installation is broken (or at least scripts that rely on this break anywhere else).
if the administrator has changed the default encoding, then this cannot possibly affect all the setup.py files out there. Also, it is a fact that the deployed versions of the register command just send byte strings in setup.py as-is, without trying to do any kind of recoding.
In any case, PyPI now requires that the form submission uses UTF-8, and refuses anything else. So it *is* impossible to send, say, Latin-1; whether the client makes that happen by properly encoding Unicode strings or whether they are in setup.py in the first place does not matter.
So can I have one setup.py for both Python 2.4 and Python 2.5 that does the correct thing when creating a Windows installer for Python 2.4 (I've used Unicode strings for that until now) and using the upload command with Python CVS (which seems to require a byte string now)? I'd like to avoid having to use version checks in setup.py.
[...]
Bye, Walter Dörwald
Walter Dörwald wrote:
So can I have one setup.py for both Python 2.4 and Python 2.5 that does the correct thing when creating a Windows installer for Python 2.4 (I've used Unicode strings for that until now) and using the upload command with Python CVS (which seems to require a byte string now)? I'd like to avoid having to use version checks in setup.py.
Well, the upload command doesn't look at the metadata. It is the register command which does, and it indeed requires utf-8 at the moment. This can be fixed, of course, but not for already-released versions. Regards, Martin
Martin v. Löwis wrote:
Walter Dörwald wrote:
So can I have one setup.py for both Python 2.4 and Python 2.5 that does the correct thing when creating a Windows installer for Python 2.4 (I've used Unicode strings for that until now) and using the upload command with Python CVS (which seems to require a byte string now)? I'd like to avoid having to use version checks in setup.py.
Well, the upload command doesn't look at the metadata. It is the register command which does,
OK.
and it indeed requires utf-8 at the moment. This can be fixed, of course,
The register command in 2.4 (and current CVS) simply does a value = str(value) in post_to_server() so the encoded bytes sent depend on the default encoding. Would it be sufficient to change this to value = unicode(value).encode("utf-8") Another solution might be to include the encoding in the Content-type header of the request. IMHO the best solution would be to do both: Always use UTF-8 as the encoding and include this in the Content-type header in the request. PyPI should honor this encoding when it finds it and should fall back to whatever it used before if it doesn't.
but not for already-released versions.
True, but I can live with that as long as I can use the same setup.py for bdist_windist and register under Python 2.4 and upload under Python CVS. Bye, Walter Dörwald
Walter Dörwald wrote:
The register command in 2.4 (and current CVS) simply does a value = str(value) in post_to_server() so the encoded bytes sent depend on the default encoding. Would it be sufficient to change this to value = unicode(value).encode("utf-8")
Indeed. I think this can go into 2.4.2.
Another solution might be to include the encoding in the Content-type header of the request. IMHO the best solution would be to do both: Always use UTF-8 as the encoding and include this in the Content-type header in the request. PyPI should honor this encoding when it finds it and should fall back to whatever it used before if it doesn't.
Yeah, well :-) Content-type in form upload is a mess, as you certainly know. It should be honored, but commonly isn't. This, in turn, causes browsers to ignore it. PyPI uses the CGI module. It currently decodes anything that doesn't have a filename attribute to UTF-8, causing rejection of anything that doesn't send UTF-8. This could be fixed/extended, but I think that would be best done in the CGI module, for consumption by any application that uses form upload. For example, doing cgi.FieldStorage(..., encoding="UTF-8") should cause a) decoding of every field that has an encoding= in its content type b) decoding of every field that is not a file to UTF-8. It is a file if it I) has a filename, or II) cannot be decoded to the target decoding For backwards compatibility, a) can only be enabled if the CGI application explicitly tells what encoding it expects. I'd like to state "contributions are welcome", although others may think differently. Regards, Martin
Martin v. Löwis wrote:
Walter Dörwald wrote:
The register command in 2.4 (and current CVS) simply does a value = str(value) in post_to_server() so the encoded bytes sent depend on the default encoding. Would it be sufficient to change this to value = unicode(value).encode("utf-8")
Indeed. I think this can go into 2.4.2.
OK, I've checked this into HEAD and release24-maint (including the change to the Content-Type header).
Another solution might be to include the encoding in the Content-type header of the request. IMHO the best solution would be to do both: Always use UTF-8 as the encoding and include this in the Content-type header in the request. PyPI should honor this encoding when it finds it and should fall back to whatever it used before if it doesn't.
Yeah, well :-) Content-type in form upload is a mess, as you certainly know. It should be honored, but commonly isn't. This, in turn, causes browsers to ignore it.
Fortunately we have both ends under control (except for old Python versions).
PyPI uses the CGI module. It currently decodes anything that doesn't have a filename attribute to UTF-8, causing rejection of anything that doesn't send UTF-8. This could be fixed/extended, but I think that would be best done in the CGI module, for consumption by any application that uses form upload. For example, doing
cgi.FieldStorage(..., encoding="UTF-8")
should cause
a) decoding of every field that has an encoding= in its content type b) decoding of every field that is not a file to UTF-8. It is a file if it I) has a filename, or II) cannot be decoded to the target decoding
For backwards compatibility, a) can only be enabled if the CGI application explicitly tells what encoding it expects.
I'd like to state "contributions are welcome", although others may think differently.
OK, I'll see, if I can give this a try. Bye, Walter Dörwald
Walter Dörwald wrote:
There's been a problem with your request
exceptions.UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 92: ordinal not in range(128)
That should be fixed now, please try again. Please report further errors you find to sf.net/projects/pypi. Suggestions/RFEs could go to the PyPI tracker, or here to python-dev. Regards, Martin
Martin v. Löwis wrote:
Walter Dörwald wrote:
There's been a problem with your request
exceptions.UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 92: ordinal not in range(128)
That should be fixed now, please try again.
Works perfectly, thanks!
[...]
Bye, Walter Dörwald
participants (3)
-
"Martin v. Löwis"
-
martin@v.loewis.de
-
Walter Dörwald