Mailman 3 Strings can sometimes convert to bytes without an encoding - Python-ideas

June 14, 2016

      Current behavior (3.5.1):
    >>> bytes('')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: string argument without an encoding

Suggestion:
If the character size is 1, the `bytes`/`bytearray` constructor
doesn't need a specified encoding.

High-level idea:
If the string only has code points in range(128), encoding is optional
(and useless anyway). The new error message could be
    TypeError: non-ASCII string argument without an encoding

How:
CPython strings store characters in an array, such that each character
takes a single entry. With an entry per character, indexing is just a
regular C array index operation. Since PEP 393, the size( )of the
elements of the array is just the size needed for the largest
character. Thus, CPython strings "know" whether or not they're ASCII.

Other implementations without PEP 393 can do a scan of the code points
to check the 0-127 condition during building. That means O(n) more
checks, but in those implementations, the per-character checks are
already necessary with an explicit encoding, since you'd need to see
if that character needs encoding.

(The `bytes` and ASCII-`str` could in fact share memory, given a few
tweaks. But that's an implementation detail.)

Strings can sometimes convert to bytes without an encoding

Franklin? Lee

Guido van Rossum

Franklin? Lee

Matt Ruffalo

Steven D'Aprano

Random832

Ethan Furman

MRAB

Franklin? Lee

Terry Reedy

Serhiy Storchaka

Terry Reedy

Stephen J. Turnbull

Greg Ewing

Franklin? Lee

Guido van Rossum

Franklin? Lee

Matt Ruffalo

Steven D'Aprano

Random832

Ethan Furman

MRAB

Franklin? Lee

Terry Reedy

Serhiy Storchaka

Terry Reedy

Stephen J. Turnbull

Greg Ewing

Franklin? Lee

tags

participants (11)