[Python-ideas] Adding str.isascii() ?

M.-A. Lemburg mal at egenix.com
Fri Jan 26 08:02:46 EST 2018

On 26.01.2018 12:17, INADA Naoki wrote:
>> No, because you can pass in maxchar to PyUnicode_New() and
>> the implementation will take this as hint to the max code point
>> used in the string. There is no check done whether maxchar
>> is indeed the minimum upper bound to the code point ordinals.
> API doc says:
> """
> maxchar should be the true maximum code point to be placed in the string.
> As an approximation, it can be rounded up to the nearest value in the
> sequence 127, 255, 65535, 1114111.
> """
> https://docs.python.org/3/c-api/unicode.html#c.PyUnicode_New
> Since doc says *should*, strings created with wrong maxchar
> are considered invalid object.

Not really: "should" means should, not must :-) Objects created
with PyUnicode_New() are valid and ready (this only has a meaning
for legacy strings).

You can set maxchar to 64k and still just use ASCII as content.
In some cases, you may want the internal string representation
to be wchar_t compatible or work with Py_UCS2/4, so both 64k
and sys.maxunicode are reasonable and valid values.

Overall, I'm starting to believe that a str.maxchar() function
would be a better choice than to only go for ASCII.

This could have an optional parameter "exact" to force scanning
the string and returning the actual max code point ordinal
when set to True (default), or return the approximation based
on the used kind if not set (which is many cases, will give
you a good hint).

For checking ASCII, you'd then write:

def isascii(s):
    if s.maxchar(exact=False) < 128:
        return True
    if s.maxchar() < 128:
        return True
    return False

Marc-Andre Lemburg

Professional Python Services directly from the Experts (#1, Jan 26 2018)
>>> Python Projects, Coaching and Consulting ...  http://www.egenix.com/
>>> Python Database Interfaces ...           http://products.egenix.com/
>>> Plone/Zope Database Interfaces ...           http://zope.egenix.com/

::: We implement business ideas - efficiently in both time and costs :::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611

More information about the Python-ideas mailing list