[Python-ideas] Strings can sometimes convert to bytes without an encoding

Guido van Rossum guido at python.org
Tue Jun 14 19:26:19 EDT 2016


-1. Such a check for the contents of the string sounds exactly like the
Python 2 behavior we are trying to get away with.

On Tue, Jun 14, 2016 at 3:58 PM, Franklin? Lee <
leewangzhong+python at gmail.com> wrote:

> Current behavior (3.5.1):
>     >>> bytes('')
>     Traceback (most recent call last):
>       File "<stdin>", line 1, in <module>
>     TypeError: string argument without an encoding
>
> Suggestion:
> If the character size is 1, the `bytes`/`bytearray` constructor
> doesn't need a specified encoding.
>
> High-level idea:
> If the string only has code points in range(128), encoding is optional
> (and useless anyway). The new error message could be
>     TypeError: non-ASCII string argument without an encoding
>
> How:
> CPython strings store characters in an array, such that each character
> takes a single entry. With an entry per character, indexing is just a
> regular C array index operation. Since PEP 393, the size( )of the
> elements of the array is just the size needed for the largest
> character. Thus, CPython strings "know" whether or not they're ASCII.
>
> Other implementations without PEP 393 can do a scan of the code points
> to check the 0-127 condition during building. That means O(n) more
> checks, but in those implementations, the per-character checks are
> already necessary with an explicit encoding, since you'd need to see
> if that character needs encoding.
>
> (The `bytes` and ASCII-`str` could in fact share memory, given a few
> tweaks. But that's an implementation detail.)
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>



-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20160614/13e11495/attachment.html>


More information about the Python-ideas mailing list