Re: [Python-ideas] Adding str.isascii() ?

26 Jan 2018

      2018-01-26 14:43 GMT+01:00 M.-A. Lemburg :
...
If that's indeed being used as assumption, the docs must be
fixed and PyUnicode_New() should verify this assumption as
well - not only in debug builds using C asserts() :-)
As PyUnicode_FromStringAndSize(NULL, size), PyUnicode_New(size,
maxchar) only allocates memory with uninitialized characters.

I don't see how PyUnicode_New() could check the string content since
the content is unknow yet...

The new public C API added by PEP 393 is hard to use correctly, but
they are the most efficient. Functions like PyUnicode_FromString() are
simple to use and very hard to misuse :-) PyPy developers asked me to
simply drop all these new public C API, make them private. At least,
deprecate them. But I never looked in depth at the new API. I don't
know if Cython uses it for example.

Some APIs are still private like _PyUnicodeWriter which allows to
create a string in multiple steps with a smart strategy to reduce or
even avoid realloc() and conversions from the different storage types
(UCS1, UCS2, UCS4). This API is very efficient, but also hard to use.
...
C extensions can easily create strings using PyUnicode_New()
which do not adhere to such a requirement and then write
arbitrary content using PyUnicode_WRITE(). In some cases,
this may even be necessary, say in case the extension doesn't
know what data is being written, reading it from some external
source.
It would be a bug in the C extension.
...
I'm not too familiar with the new Unicode code, but it seems
that this requirement is not checked everywhere, e.g. the
resize code doesn't seem to have such checks either (only in
debug versions).
It must be checked everywhere. If it's not the case, it's an obvious
bug in CPython.

If you spotted a bug, please report a bug ;-)

Victor

Re: [Python-ideas] Adding str.isascii() ?

Victor Stinner