Serhiy Storchaka wrote:
> Forbids calling str() without object if encoding or errors are
> specified. It is very unlikely that this can break a real code, so I
> propose to make it an error without a deprecation period.

+1, I suspect that nobody would intentionally pass an argument to the encoding and/or errors parameter(s) without specifying an object. Returning an empty string from this seems like it would cover up bugs rather than be useful in any capacity.

Serhiy Storchaka wrote:
> 2. Make the first parameter of str(), bytes() and bytearray()
> positional-only.

+1, I don't think I've ever seen a single instance of code that passes the first parameter, *object*, as a kwarg: str(object=obj). As long as the other two parameters, *encoding* and *error*, remain keyword arguments, I think this would make sense.

Serhiy Storchaka wrote:
> 3. Make encoding required if errors is specified in str(). This will
> reduce the number of possible combinations, makes str() more similar to
> bytes() and bytearray() and simplify the mental model: if encoding is
> specified, then we decode, and the first argument must be a bytes-like
> object, otherwise we convert an object to a string using __str__.

Hmm, I think this one might require some further consideration. But I will say that the implicit behavior is not very obvious.

Isn't overly clear, implicit 'utf-8' conversion:
>>> str(b'\xc3\xa1', errors='strict')
'á'

Makes sense, and is highly explicit:
>>> str(b'\xc3\xa1', encoding='utf-8', errors='strict')
'á'

This is also fine ('strict' is a very reasonable default for *errors*)
>>> str(b'\xc3\xa1', encoding='utf-8')
'á'


On a related note though, I'm not a fan of this behavior:
>>> str(b'\xc3\xa1')
"b'\\xc3\\xa1'"

Passing a bytes object to str() without specifying an encoding seems like a mistake, I honestly don't see how this ("b'\\xc3\\xa1'") would even be useful in any capacity. I would expect this to instead raise a TypeError, similar to passing a string to bytes() without specifying an encoding:
>>> bytes('á')
...
TypeError: string argument without an encoding

I'd much prefer to see something like this:
>>> str(b'\xc3\xa1')
...
TypeError: bytes argument without an encoding

Is there some use case for returning "b'\\xc3\\xa1'" from this operation that I'm not seeing? To me, it seems equally, if not more confusing and pointless than returning an empty string from str(errors='strict') or some other combination of *errors* and *encoding* kwargs without passing an object.

On Sun, Dec 15, 2019 at 9:10 AM Serhiy Storchaka <storchaka@gmail.com> wrote:
Currently str() takes up to 3 arguments. All are optional and
positional-or-keyword. All combinations are valid:

str()
str(object=object)
str(object=buffer, encoding=encoding)
str(object=buffer, errors=errors)
str(object=buffer, encoding=encoding, errors=errors)
str(encoding=encoding)
str(errors=errors)
str(encoding=encoding, errors=errors)

The last three are especially surprising. If you do not specify an
object, str() ignores values of encoding and errors and returns an empty
string.

bytes() and bytearray() are more limited. Valid combinations are:

bytes()
bytes(source=object)
bytes(source=string, encoding=encoding)
bytes(source=string, encoding=encoding, errors=errors)

I propose several changes:

1. Forbids calling str() without object if encoding or errors are
specified. It is very unlikely that this can break a real code, so I
propose to make it an error without a deprecation period.

2. Make the first parameter of str(), bytes() and bytearray()
positional-only. Originally this feature was an implementation artifact:
before 3.6 parameters of a C implemented function should be either all
positional-only (if used PyArg_ParseTuple), or all keyword (if used
PyArg_ParseTupleAndKeywords). So str(), bytes() and bytearray() accepted
the first parameter by keyword. We already made similar changes for
int(), float(), etc: int(x=42) no longer works.

Unlikely str(object=object) is used in a real code, so we can skip a
deprecation period for this change too.

3. Make encoding required if errors is specified in str(). This will
reduce the number of possible combinations, makes str() more similar to
bytes() and bytearray() and simplify the mental model: if encoding is
specified, then we decode, and the first argument must be a bytes-like
object, otherwise we convert an object to a string using __str__.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/YMIGWRUERUG66CKRJXDXNPCIDHRQJY6V/
Code of Conduct: http://python.org/psf/codeofconduct/