[Python-ideas] Adding str.isascii() ?

Victor Stinner victor.stinner at gmail.com
Fri Jan 26 08:55:40 EST 2018


2018-01-26 14:43 GMT+01:00 M.-A. Lemburg <mal at egenix.com>:
> If that's indeed being used as assumption, the docs must be
> fixed and PyUnicode_New() should verify this assumption as
> well - not only in debug builds using C asserts() :-)

As PyUnicode_FromStringAndSize(NULL, size), PyUnicode_New(size,
maxchar) only allocates memory with uninitialized characters.

I don't see how PyUnicode_New() could check the string content since
the content is unknow yet...

The new public C API added by PEP 393 is hard to use correctly, but
they are the most efficient. Functions like PyUnicode_FromString() are
simple to use and very hard to misuse :-) PyPy developers asked me to
simply drop all these new public C API, make them private. At least,
deprecate them. But I never looked in depth at the new API. I don't
know if Cython uses it for example.

Some APIs are still private like _PyUnicodeWriter which allows to
create a string in multiple steps with a smart strategy to reduce or
even avoid realloc() and conversions from the different storage types
(UCS1, UCS2, UCS4). This API is very efficient, but also hard to use.

> C extensions can easily create strings using PyUnicode_New()
> which do not adhere to such a requirement and then write
> arbitrary content using PyUnicode_WRITE(). In some cases,
> this may even be necessary, say in case the extension doesn't
> know what data is being written, reading it from some external
> source.

It would be a bug in the C extension.

> I'm not too familiar with the new Unicode code, but it seems
> that this requirement is not checked everywhere, e.g. the
> resize code doesn't seem to have such checks either (only in
> debug versions).

It must be checked everywhere. If it's not the case, it's an obvious
bug in CPython.

If you spotted a bug, please report a bug ;-)

Victor


More information about the Python-ideas mailing list