New submission from Zachary Westrick zackzackzackw@gmail.com:
The docstring for the str() builtin reads
str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str
Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.__str__() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.
The statement "encoding defaults to sys.getdefaultencoding()." implies that the encoding argument defaults to sys.getdefaultencoding(), which would typically mean that
str(X, encoding=sys.getdefaultencoding()) == str(X)
However, this is not the case
str(b'mystring', encoding=sys.getdefaultencoding()) -> 'mystring' str(b'mystring') -> "b'mystring'"
It seems that the phrase "encoding defaults" is not referring to the argument named encoding.
----------
_______________________________________ Python tracker report@bugs.python.org https://bugs.python.org/issue39574 _______________________________________
Steven D'Aprano steve+python@pearwood.info added the comment:
The docs are correct, you are just misinterpreting them. Which could, I guess, suggest the docs could do with improvement.
With *one* argument, `str(obj)` returns a string via `object.__str__(obj)` or `repr(obj)`, whichever is defined. That includes the case where obj is a bytes object.
*Only* in the two or three argument case where you explicitly provide either the encoding or errors parameter will bytes be decoded. But you must provide at least one of encoding or errors. If you provide neither, you have the one-argument form above.
The default value for encoding is only relevant in cases like this:
# encoding defaults to sys.getdefaultencoding() py> str(b'a', errors='ignore') 'a'
Here's my suggested rewording:
***
str(object='') -> str str(bytes_or_buffer [, encoding] [, errors]) -> str
Create a new string object from the given object.
If a single argument is given, returns the result of object.__str__() (if defined) or repr(object).
If encoding or errors or both are specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. If errors is specified, the default encoding is sys.getdefaultencoding(). If encoding is specified, errors defaults to 'strict'.
---------- nosy: +steven.daprano versions: +Python 3.6, Python 3.7, Python 3.8, Python 3.9
_______________________________________ Python tracker report@bugs.python.org https://bugs.python.org/issue39574 _______________________________________
Eric V. Smith eric@trueblade.com added the comment:
That's a good improvement, Steven. I like your wording about errors better than the wording about encoding, so how about changing the next to last sentence to:
"If errors is specified, encoding defaults to sys.getdefaultencoding()."
---------- nosy: +eric.smith
_______________________________________ Python tracker report@bugs.python.org https://bugs.python.org/issue39574 _______________________________________
Steven D'Aprano steve+python@pearwood.info added the comment:
Eric: sure, I'm happy with your modification.
Alas, I'm currently having technology issues which prevents me from doing a PR. Would you care to do the honours?
----------
_______________________________________ Python tracker report@bugs.python.org https://bugs.python.org/issue39574 _______________________________________
Change by Eric V. Smith eric@trueblade.com:
---------- keywords: +patch pull_requests: +17777 stage: -> patch review pull_request: https://github.com/python/cpython/pull/18401
_______________________________________ Python tracker report@bugs.python.org https://bugs.python.org/issue39574 _______________________________________
Steven D'Aprano steve+python@pearwood.info added the comment:
Sorry everyone, due to technology problems I am unable to comment on the github page, and due to ISP problems I've been off the internet for a few days.
pull_request: https://github.com/python/cpython/pull/18401
[Serhiy]
Is not "or both" redundant?
I don't think so. In regular English, "or" can imply exclusive-or:
"Shall we eat at the Thai or the Italian restaurant?"
There are four relevant cases:
- supply neither encoding nor errors; - supply only encoding; - supply only errors; - supply both encoding and errors.
Using "or" may be, for some readers, ambiguous: is the last option included or not? For the sake of two extra words, let's make it clear and unambiguous.
[Serhiy]
Use just 'utf-8' instead of sys.getdefaultencoding(). It is a constant in Python 3.
I didn't know that. I'm okay with that change, thank you.
[Serhiy]
- str(bytes_or_buffer[, encoding[, errors]]) -> str
- str(bytes_or_buffer, encoding='utf-8', errors='strict') -> str
I'm happy with that.
Thank you everyone, and sorry again that I have trouble with the Github process. (I need a new computer with a newer OS.)
----------
_______________________________________ Python tracker report@bugs.python.org https://bugs.python.org/issue39574 _______________________________________
Eric V. Smith eric@trueblade.com added the comment:
I've created a PR and requested review from stevendaprano. I think the backports are correct.
----------
_______________________________________ Python tracker report@bugs.python.org https://bugs.python.org/issue39574 _______________________________________
Serhiy Storchaka storchaka+cpython@gmail.com added the comment:
See a discussion on Python-Dev: https://mail.python.org/archives/list/python-dev@python.org/message/YMIGWRUE...
---------- nosy: +serhiy.storchaka
_______________________________________ Python tracker report@bugs.python.org https://bugs.python.org/issue39574 _______________________________________
Steven D'Aprano steve+python@pearwood.info added the comment:
On Fri, Feb 07, 2020 at 12:33:45PM +0000, Serhiy Storchaka wrote:
Serhiy Storchaka storchaka+cpython@gmail.com added the comment:
See a discussion on Python-Dev: https://mail.python.org/archives/list/python-dev@python.org/message/YMIGWRUE...
I don't know whether the very odd calls
str(encoding='spam') str(errors='eggs') str(encoding='spam', errors='eggs')
are intentional or not. I suspect not: to me, it looks like an accident of implementation, not a deliberate feature. Under what circumstances would somebody intentionally provide an encoding and error handler when they aren't actually going to use them? There may be really unusual cases:
args = () if condition else (mybytes,) str = str(*args, encoding='spam')
but I doubt they are going to be either common or something we ought to encourage. Regardless of whether we deprecate and remove those three odd cases or not, I don't think we should bother documenting them.
If anyone disagrees, and wants to document them, that's okay, but you can document them as a separate PR with a separate discussion. Let's just fix the confusion over the default encoding here and worry about other issues later. Don't let the perfect get in the way of the good enough for now :-)
----------
_______________________________________ Python tracker report@bugs.python.org https://bugs.python.org/issue39574 _______________________________________
Eric V. Smith eric@trueblade.com added the comment:
I agree that the current changes are an improvement, and should be committed.
----------
_______________________________________ Python tracker report@bugs.python.org https://bugs.python.org/issue39574 _______________________________________
Change by Terry J. Reedy tjreedy@udel.edu:
---------- versions: -Python 3.5, Python 3.6
_______________________________________ Python tracker report@bugs.python.org https://bugs.python.org/issue39574 _______________________________________