Re: [Python-Dev] [Distutils] unicode bug in distutils
At 02:47 PM 2/24/2007 -0600, Tarek Ziadé wrote:
I have created a setup.py file for distirbution and I bumped into a small bug when i tried to set my name in the contact field (Tarek Ziadé)
Using string (utf8 file):
setup( maintainer="Tarek Ziadé" )
leads to:
File ".../lib/python2.5/distutils/command/register.py", line 162, in send_metadata auth) File ".../lib/python2.5/distutils/command/register.py", line 257, in post_to_server value = unicode(value).encode("utf-8") UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 10: ordinal not in range(128)
Using unicode:
setup( maintainer=u"Tarek Ziadé" )
leads to:
File ".../lib/python2.5/distutils/dist.py", line 1094, in write_pkg_file file.write('Author: %s\n' % self.get_contact() ) UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 18: ordinal not in range(128)
I would propose a patch for this problem but i don't know what would be the best input (i guess unicode for names)
At 05:45 PM 2/24/2007 -0500, Tres Seaver wrote:
Don't you still need to tell Python about the encoding of your string literals [1] [2] ? E.g.::
That's not the problem, it's that the code that writes the PKG-INFO file doesn't handle Unicode. See distutils.dist.DistributionMetadata.write_pkg_info(). It needs to use a file with encoding support, if it's doing unicode However, there's currently no standard, as far as I know, for what encoding the PKG-INFO file should use. Meanwhile, the 'register' command accepts Unicode, but is broken in handling it. Essentially, the problem is that Python 2.5 broke this by adding a unicode *requirement* to the "register" command. Previously, register simply sent whatever you gave it, and the PKG-INFO writing code still does. Unfortunately, this means that there is no longer any one value that you can use for your name that will be accepted by both "register" and anything that writes a PKG-INFO file. Both register and write_pkg_info() are arguably broken here, and should be able to work with either strings or unicode, and degrade gracefully in the event of non-ASCII characters in a string. (Because even though "register" is only run by the package's author, users may run other commands that require a PKG-INFO, so a package prepared using Python <2.5 must still be usable with Python 2.5 distutils, and Python <2.5 allows 8-bit maintainer names.) Unfortunately, this isn't fixable until there's a new 2.5.x release. For previous Python versions, both register and write_pkg_info() accepted 8-bit strings and passed them on as-is, so the only workaround for this issue at the moment is to revert to Python 2.4 or less.
"Phillip J. Eby" <pje@telecommunity.com> writes:
However, there's currently no standard, as far as I know, for what encoding the PKG-INFO file should use.
Who would define such a standard? My vote goes for “default is UTF-8”.
Meanwhile, the 'register' command accepts Unicode, but is broken in handling it. […]
Unfortunately, this isn't fixable until there's a new 2.5.x release. For previous Python versions, both register and write_pkg_info() accepted 8-bit strings and passed them on as-is, so the only workaround for this issue at the moment is to revert to Python 2.4 or less.
What is the prognosis on this issue? It's still hitting me in Python 2.5.4. -- \ “Everything you read in newspapers is absolutely true, except | `\ for that rare story of which you happen to have first-hand | _o__) knowledge.” —Erwin Knoll | Ben Finney
On Fri, Apr 3, 2009 at 2:25 AM, Ben Finney <ben+python@benfinney.id.au> wrote:
"Phillip J. Eby" <pje@telecommunity.com> writes:
However, there's currently no standard, as far as I know, for what encoding the PKG-INFO file should use.
Who would define such a standard?
PEP 376 where we can explain that all files in egg-info should be in a specific encoding
My vote goes for “default is UTF-8”.
+1
Meanwhile, the 'register' command accepts Unicode, but is broken in handling it. […]
how so ? Tarek
Ben Finney <ben+python@benfinney.id.au> writes:
"Phillip J. Eby" <pje@telecommunity.com> writes:
Meanwhile, the 'register' command accepts Unicode, but is broken in handling it. […]
Unfortunately, this isn't fixable until there's a new 2.5.x release. For previous Python versions, both register and write_pkg_info() accepted 8-bit strings and passed them on as-is, so the only workaround for this issue at the moment is to revert to Python 2.4 or less.
What is the prognosis on this issue? It's still hitting me in Python 2.5.4.
Any word on this? Is there an open bug tracker issue with more information? Who's working on this? -- \ “If sharing a thing in no way diminishes it, it is not rightly | `\ owned if it is not shared.” —Saint Augustine | _o__) | Ben Finney
Meanwhile, the 'register' command accepts Unicode, but is broken in handling it. […]
Unfortunately, this isn't fixable until there's a new 2.5.x release. For previous Python versions, both register and write_pkg_info() accepted 8-bit strings and passed them on as-is, so the only workaround for this issue at the moment is to revert to Python 2.4 or less. What is the prognosis on this issue? It's still hitting me in Python 2.5.4.
Any word on this? Is there an open bug tracker issue with more information? Who's working on this?
For Python 2.5.4, no further changes will be made. If you can reproduce with 2.6, and can't find a tracker issue, make a new report. Regards, Martin
Ben Finney <ben+python@benfinney.id.au> writes:
Is there an open bug tracker issue with more information?
Answer: <URL:http://bugs.python.org/issue2562>. Apparently the issue is resolved <URL:http://bugs.python.org/msg72385> for Python 2.6. I will need to wait for my distribution to catch up before I can know whether it's resolved. -- \ “The World is not dangerous because of those who do harm but | `\ because of those who look at it without doing anything.” | _o__) —Albert Einstein | Ben Finney
participants (4)
-
"Martin v. Löwis"
-
Ben Finney
-
Phillip J. Eby
-
Tarek Ziadé