[Python-Dev] PEP 393 review

Thu Aug 25 20:47:25 CEST 2011

"Martin v. Löwis", 24.08.2011 20:15:
> - issues to be considered (unclarities, bugs, limitations, ...)

A problem of the current implementation is the need for calling 
PyUnicode_(FAST_)READY(), and the fact that it can fail (e.g. due to 
insufficient memory). Basically, this means that even something as trivial 
as trying to get the length of a Unicode string can now result in an error.

I just noticed this when rewriting Cython's helper function that searches a 
unicode string for a (Py_UCS4) character. Previously, the entire function 
was safe, could never produce an error and therefore always returned a 
boolean result. In the new world, the caller of this function must check 
and propagate errors. This may not be a major issue in most cases, but it 
can have a non-trivial impact on user code, depending on how deep in a call 
chain this happens and on how much control the user has over the call chain 
(think of a C callback, for example).

Also, even in the case that there is no error, the potential need to build 
up the string on request means that the run time and memory requirements of 
an algorithm are less predictable now as they depend on the origin of the 
input and not just its Python level string content.

I would be happier with an implementation that avoided this by always 
instantiating the data buffer right from the start, instead of carrying 
only a Py_UNICODE buffer for old-style instances.

Stefan