unicode_string future, str -> basestring, fix or feature
Suppose a 2.7 standard library function is documented as taking a 'string' argument, such as these examples from the turtle module. pencolor(colorstring) Set pencolor to colorstring, which is a Tk color specification string, such as "red", "yellow", or "#33cc8c". turtle.shape(name=None) Parameters: name – a string which is a valid shapename class turtle.Shape(type_, data) Parameters: type_ – one of the strings “polygon”, “image”, “compound” Suppose adding from __future__ import unicode_literals to a working program causes an exception, such as with turtle http://bugs.python.org/issue15618 (Note: unicode_literals is not indexed.) Is this a programmer error for passing unicode instead of string, or a library error for not accepting unicode? Is changing 'isinstance(x, str)' in the library (with whatever other changes are needed) a bugfix to be pushed or a prohibited API expansion? -- Terry Jan Reedy
On Sun, 02 Mar 2014 15:01:01 -0500
Terry Reedy
Suppose a 2.7 standard library function is documented as taking a 'string' argument, such as these examples from the turtle module.
pencolor(colorstring) Set pencolor to colorstring, which is a Tk color specification string, such as "red", "yellow", or "#33cc8c".
turtle.shape(name=None) Parameters: name – a string which is a valid shapename
class turtle.Shape(type_, data) Parameters: type_ – one of the strings “polygon”, “image”, “compound”
Suppose adding from __future__ import unicode_literals to a working program causes an exception, such as with turtle http://bugs.python.org/issue15618 (Note: unicode_literals is not indexed.)
Is this a programmer error for passing unicode instead of string, or a library error for not accepting unicode?
In most cases I would say it's a library error. The only exception is when the argument is clearly meant as a byte string rather than a text string, such as when writing to a binary file or a socket. Regards Antoine.
It looks to me like a defect in the library (*), and you are making a
reasonable argument that we should fix it in 2.7 to help people be more
prepared for the transition to Python 3.
(*) As Antoine points out, pretty much the only time where it's not a good
idea to switch from str to basestring is when the data is meant to be
binary -- but in this case it's clearly text (we can also tell from what
the same code looks like in Python 3 :-).
On Sun, Mar 2, 2014 at 12:01 PM, Terry Reedy
Suppose a 2.7 standard library function is documented as taking a 'string' argument, such as these examples from the turtle module.
pencolor(colorstring) Set pencolor to colorstring, which is a Tk color specification string, such as "red", "yellow", or "#33cc8c".
turtle.shape(name=None) Parameters: name - a string which is a valid shapename
class turtle.Shape(type_, data) Parameters: type_ - one of the strings "polygon", "image", "compound"
Suppose adding from __future__ import unicode_literals to a working program causes an exception, such as with turtle http://bugs.python.org/issue15618 (Note: unicode_literals is not indexed.)
Is this a programmer error for passing unicode instead of string, or a library error for not accepting unicode? Is changing 'isinstance(x, str)' in the library (with whatever other changes are needed) a bugfix to be pushed or a prohibited API expansion?
-- Terry Jan Reedy
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/ guido%40python.org
-- --Guido van Rossum (python.org/~guido)
On 3/2/2014 3:12 PM, Guido van Rossum wrote:
It looks to me like a defect in the library (*), and you are making a reasonable argument that we should fix it in 2.7 to help people be more prepared for the transition to Python 3.
(*) As Antoine points out, pretty much the only time where it's not a good idea to switch from str to basestring is when the data is meant to be binary -- but in this case it's clearly text (we can also tell from what the same code looks like in Python 3 :-).
Thanks to both of you. 'bugfix' noted on the issue.
On Sun, Mar 2, 2014 at 12:01 PM, Terry Reedy
mailto:tjreedy@udel.edu> wrote: Suppose a 2.7 standard library function is documented as taking a 'string' argument, such as these examples from the turtle module.
pencolor(colorstring) Set pencolor to colorstring, which is a Tk color specification string, such as "red", "yellow", or "#33cc8c".
turtle.shape(name=None) Parameters: name – a string which is a valid shapename
class turtle.Shape(type_, data) Parameters: type_ – one of the strings “polygon”, “image”, “compound”
Suppose adding from __future__ import unicode_literals to a working program causes an exception, such as with turtle http://bugs.python.org/__issue15618 http://bugs.python.org/issue15618 (Note: unicode_literals is not indexed.)
Is this a programmer error for passing unicode instead of string, or a library error for not accepting unicode? Is changing 'isinstance(x, str)' in the library (with whatever other changes are needed) a bugfix to be pushed or a prohibited API expansion?
-- Terry Jan Reedy
02.03.14 22:01, Terry Reedy написав(ла):
Is this a programmer error for passing unicode instead of string, or a library error for not accepting unicode? Is changing 'isinstance(x, str)' in the library (with whatever other changes are needed) a bugfix to be pushed or a prohibited API expansion?
Patches which add support for unicode strings were accepted for one issues (e.g. http://bugs.python.org/issue19099) and rejected for other issues (e.g. http://bugs.python.org/issue20014 and http://bugs.python.org/issue20015). Some issues (e.g. http://bugs.python.org/issue18695) hang in undefined state.
On Sun, Mar 2, 2014 at 11:23 PM, Serhiy Storchaka
Patches which add support for unicode strings were accepted for one issues (e.g. http://bugs.python.org/issue19099) and rejected for other issues (e.g. http://bugs.python.org/issue20014 and http://bugs.python.org/issue20015).
See also http://bugs.python.org/issue15843. --Berker
On 3/2/2014 4:23 PM, Serhiy Storchaka wrote:
02.03.14 22:01, Terry Reedy написав(ла):
Is this a programmer error for passing unicode instead of string, or a library error for not accepting unicode? Is changing 'isinstance(x, str)' in the library (with whatever other changes are needed) a bugfix to be pushed or a prohibited API expansion?
Patches which add support for unicode strings were accepted for one issues (e.g. http://bugs.python.org/issue19099) and rejected for other issues (e.g. http://bugs.python.org/issue20014 and http://bugs.python.org/issue20015). Some issues (e.g. http://bugs.python.org/issue18695) hang in undefined state.
If Antoine and Guido don't reverse themselves, those could perhaps be re-opened. It strikes me as borderline, depending interpretation of 'string'. I am not surprised there have been different resolutions. -- Terry Jan Reedy
On 3 March 2014 10:02, Terry Reedy
On 3/2/2014 4:23 PM, Serhiy Storchaka wrote:
02.03.14 22:01, Terry Reedy написав(ла):
Is this a programmer error for passing unicode instead of string, or a library error for not accepting unicode? Is changing 'isinstance(x, str)' in the library (with whatever other changes are needed) a bugfix to be pushed or a prohibited API expansion?
Patches which add support for unicode strings were accepted for one issues (e.g. http://bugs.python.org/issue19099) and rejected for other issues (e.g. http://bugs.python.org/issue20014 and http://bugs.python.org/issue20015). Some issues (e.g. http://bugs.python.org/issue18695) hang in undefined state.
If Antoine and Guido don't reverse themselves, those could perhaps be re-opened. It strikes me as borderline, depending interpretation of 'string'. I am not surprised there have been different resolutions.
It occurs to me that it would be good to have a "bug fix or feature?" section in the developer guide to provide a more permanent record of dicussions like this. That would also be the place to document tricks like defining a private API to fix a bug in a maintenance release, and then potentially making that new API public for the next feature release if it's potentially useful to end users. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
Terry Reedy writes:
On 3/2/2014 4:23 PM, Serhiy Storchaka wrote:
Patches which add support for unicode strings were accepted for one issues (e.g. http://bugs.python.org/issue19099) and rejected for other issues (e.g. http://bugs.python.org/issue20014 and http://bugs.python.org/issue20015). Some issues (e.g. http://bugs.python.org/issue18695) hang in undefined state.
If Antoine and Guido don't reverse themselves, those could perhaps be re-opened. It strikes me as borderline, depending interpretation of 'string'. I am not surprised there have been different resolutions.
I agree with Victor in http://bugs.python.org/issue18695#msg208857: there's no "bug". It is just that in the design of 2.x 'str' is not Unicode, and the "fix" is Python 3. This may be an area where 2to3 could give more help. As Victor points out in that message, the issue-by-issue approach to this inconsistency is just whack-a-mole. I would worry not only about the whack-a-mole aspect where 'unicode' objects leak into contexts where they're not supported, but also that this could confuse tools like 2to3. I agree that usage of the word "string" is all too often ambiguous in the documentation, but that doesn't justify a wholesale overhaul of the Python 2.7 API to make everything polymorphic.
AFACT, in that message Victor was only talking about allowing Unicode
filenames.
Making everything polymorphic is clearly pulling on the thread that will
unravel the entire sweater.
But... The start of this thread was about changing a few occurrences of
isinstance(..., str) to use basestring, and that's a different matter. The
Python 2 Unicode design calls for mixing of Unicode and 8-bit strings as
long as the latter contain 7-bit ASCII -- the code in turtle violates that
design by insisting on an 8-bit string. The underlying Tkinter module
handles Unicode strings just fine (and not just 7-bit ASCII).
As far as lib2to3 goes, using basestring instead of str actually
disambiguates things -- with str it can't tell for sure whether text or
binary was meant, but with basestring it's a safe bet that the intention
was text.
Finally, in most places Python 2.7 *does* handle Unicode filenames just
fine.
On Sun, Mar 2, 2014 at 6:26 PM, Stephen J. Turnbull
Terry Reedy writes:
On 3/2/2014 4:23 PM, Serhiy Storchaka wrote:
Patches which add support for unicode strings were accepted for one issues (e.g. http://bugs.python.org/issue19099) and rejected for other issues (e.g. http://bugs.python.org/issue20014 and http://bugs.python.org/issue20015). Some issues (e.g. http://bugs.python.org/issue18695) hang in undefined state.
If Antoine and Guido don't reverse themselves, those could perhaps be re-opened. It strikes me as borderline, depending interpretation of 'string'. I am not surprised there have been different resolutions.
I agree with Victor in http://bugs.python.org/issue18695#msg208857: there's no "bug". It is just that in the design of 2.x 'str' is not Unicode, and the "fix" is Python 3. This may be an area where 2to3 could give more help.
As Victor points out in that message, the issue-by-issue approach to this inconsistency is just whack-a-mole.
I would worry not only about the whack-a-mole aspect where 'unicode' objects leak into contexts where they're not supported, but also that this could confuse tools like 2to3.
I agree that usage of the word "string" is all too often ambiguous in the documentation, but that doesn't justify a wholesale overhaul of the Python 2.7 API to make everything polymorphic.
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org
-- --Guido van Rossum (python.org/~guido)
On Sun, Mar 2, 2014 at 6:44 PM, Guido van Rossum
AFACT, in that message Victor was only talking about allowing Unicode filenames.
...
Finally, in most places Python 2.7 *does* handle Unicode filenames just fine.
I'm a bit confused. In this example: http://bugs.python.org/issue18695 You are proposing that the issue should be considered a bug and a well-written patch accepted? Or is is just too late for 2.7 ? Personally I think that having some, but not all file functions accept unicode paths is pretty broken....and fixing these kinds of thing will ease 2 to 3 transition, so a good thing overall. - Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
On Mon, Mar 3, 2014 at 8:37 AM, Chris Barker
On Sun, Mar 2, 2014 at 6:44 PM, Guido van Rossum
wrote: AFACT, in that message Victor was only talking about allowing Unicode filenames.
...
Finally, in most places Python 2.7 *does* handle Unicode filenames just fine.
I'm a bit confused. In this example:
http://bugs.python.org/issue18695
You are proposing that the issue should be considered a bug and a well-written patch accepted?
Or is is just too late for 2.7 ?
Personally I think that having some, but not all file functions accept unicode paths is pretty broken....and fixing these kinds of thing will ease 2 to 3 transition, so a good thing overall.
Agreed. Given that the claim "Python 2 doesn't support Unicode filenames" is factually incorrect (in Python 2.7, most filesystem calls in fact do support Unicode, at least on some platforms), I think individual functions in the os module that are found lacking should be considered bugs, and if someone goes through the effort to supply an otherwise acceptable fix, we shouldn't reject it on the basis that we don't want to consider supporting Unicode filenames. -- --Guido van Rossum (python.org/~guido)
Guido van Rossum writes:
Given that the claim "Python 2 doesn't support Unicode filenames" is factually incorrect (in Python 2.7, most filesystem calls in fact do support Unicode, at least on some platforms),
I don't understand what "support Unicode" means. Just that with open(u"\u4e00", "w") as f: f.write("works!\n") does what is expected[1] if the user knows what he is doing (ie, has set PYTHONIOENCODING to a Unicode UTF or one of the Asian encodings)?
I think individual functions in the os module that are found lacking should be considered bugs, and if someone goes through the effort to supply an otherwise acceptable fix, we shouldn't reject it on the basis that we don't want to consider supporting Unicode filenames.
As above, "acceptable fix" means take whatever the current value is for file system name encoding, and use that to encode and decode unicode objects to/from str, or raise a UnicodeError if it doesn't work? I think it's important to define this somewhat carefully, because this is an area that has a strong tendency to "mission creep". Given that builtin open "works" by the above definition, I guess it's reasonable to accept such patches. Footnotes: [1] It writes the line "works!\n" to a file whose name consists of the single Chinese character for "one".
On Tue, Mar 4, 2014 at 5:23 AM, Stephen J. Turnbull
Guido van Rossum writes:
Given that the claim "Python 2 doesn't support Unicode filenames" is factually incorrect (in Python 2.7, most filesystem calls in fact do support Unicode, at least on some platforms),
I don't understand what "support Unicode" means. Just that
with open(u"\u4e00", "w") as f: f.write("works!\n")
does what is expected[1] if the user knows what he is doing (ie, has set PYTHONIOENCODING to a Unicode UTF or one of the Asian encodings)?
That's all I'm asking for, since that's what most functions in 2.7 already do.
I think individual functions in the os module that are found lacking should be considered bugs, and if someone goes through the effort to supply an otherwise acceptable fix, we shouldn't reject it on the basis that we don't want to consider supporting Unicode filenames.
As above, "acceptable fix" means take whatever the current value is for file system name encoding, and use that to encode and decode unicode objects to/from str, or raise a UnicodeError if it doesn't work?
The same thing that is done for other functions that take filenames.
I think it's important to define this somewhat carefully, because this is an area that has a strong tendency to "mission creep". Given that builtin open "works" by the above definition, I guess it's reasonable to accept such patches.
Right, the interpretation given to Unicode filenames by builtin open() should be propagated to other functions (I actually suspect that os.statvfs(), which apparently doesn't, is in the minority here). AFAIK that's also roughly what happens in Python 3.
Footnotes: [1] It writes the line "works!\n" to a file whose name consists of the single Chinese character for "one".
-- --Guido van Rossum (python.org/~guido)
03.03.14 02:02, Terry Reedy написав(ла):
On 3/2/2014 4:23 PM, Serhiy Storchaka wrote:
02.03.14 22:01, Terry Reedy написав(ла):
Is this a programmer error for passing unicode instead of string, or a library error for not accepting unicode? Is changing 'isinstance(x, str)' in the library (with whatever other changes are needed) a bugfix to be pushed or a prohibited API expansion?
Patches which add support for unicode strings were accepted for one issues (e.g. http://bugs.python.org/issue19099) and rejected for other issues (e.g. http://bugs.python.org/issue20014 and http://bugs.python.org/issue20015). Some issues (e.g. http://bugs.python.org/issue18695) hang in undefined state.
If Antoine and Guido don't reverse themselves, those could perhaps be re-opened. It strikes me as borderline, depending interpretation of 'string'. I am not surprised there have been different resolutions.
I believe that in all cases when valid values are ASCII-only strings (format specifiers for array, struct, memoryview, etc), we can accept both str and unicode. Especially when they are likely literals. But when valid value can be non-ASCII (e.g. file names), it is a different case, because it requires additional and may be totally different code.
participants (8)
-
Antoine Pitrou
-
Berker Peksağ
-
Chris Barker
-
Guido van Rossum
-
Nick Coghlan
-
Serhiy Storchaka
-
Stephen J. Turnbull
-
Terry Reedy