[Python-ideas] Adding str.isascii() ?

Victor Stinner victor.stinner at gmail.com
Fri Jan 26 08:31:33 EST 2018


2018-01-26 12:17 GMT+01:00 INADA Naoki <songofacandy at gmail.com>:
>> No, because you can pass in maxchar to PyUnicode_New() and
>> the implementation will take this as hint to the max code point
>> used in the string. There is no check done whether maxchar
>> is indeed the minimum upper bound to the code point ordinals.
>
> API doc says:
>
> """
> maxchar should be the true maximum code point to be placed in the string.
> As an approximation, it can be rounded up to the nearest value in the
> sequence 127, 255, 65535, 1114111.
> """
> https://docs.python.org/3/c-api/unicode.html#c.PyUnicode_New
>
> Since doc says *should*, strings created with wrong maxchar
> are considered invalid object.

PyUnicode objects must always use the most efficient storage. It's a
very strong requirement of the PEP 393. As Naoki wrote, many functions
rely on this assumption to implement fast-path.

The assumption is even implemented in the debug check
_PyUnicode_CheckConsistency():

https://github.com/python/cpython/blob/e76daebc0c8afa3981a4c5a8b54537f756e805de/Objects/unicodeobject.c#L453-L485

Victor


More information about the Python-ideas mailing list