[Python-Dev] PEP 393 review

Fri Aug 26 20:28:43 CEST 2011

"Martin v. Löwis", 26.08.2011 18:56:
> I agree with your observation that somebody should be done about error
> handling, and will update the PEP shortly. I propose that
> PyUnicode_Ready should be explicitly called on input where raising an
> exception is feasible. In contexts where it is not feasible (such
> as reading a character, or reading the length or the kind), failing to
> ready the string should cause a fatal error.

I consider this an increase in complexity. It will then no longer be enough 
to access the data, the user will first have to figure out a suitable place 
in the code to make sure it's actually there, potentially forgetting about 
it because it works in all test cases, or potentially triggering a huge 
amount of overhead that copies and 'recodes' the string data by executing 
one of the macros that does it automatically.

For the specific case of Cython, I would guess that I could just add 
another special case that reads the data from the Py_UNICODE buffer and 
combines surrogates at need, but that will only work in some cases 
(specifically not for indexing). And outside of Cython, most normal user 
code won't do that.

My gut feeling leans towards a KISS approach. If you go the route to 
require an explicit point for triggering PyUnicode_Ready() calls, why not 
just go all the way and make it completely explicit in *all* cases? I.e. 
remove all implicit calls from the macros and make it part of the new API 
semantics that users *must* call PyUnicode_FAST_READY() before doing 
anything with a new string data layout. Much fewer surprises.

Note that there isn't currently an official macro way to figure out that 
the flexible string layout has not been initialised yet, i.e. that wstr is 
set but str is not. If the implicit PyUnicode_Ready() calls get removed, 
PyUnicode_KIND() could take that place by simply returning WSTR_KIND.

That being said, the main problem I currently see is that basically all 
existing code needs to be updated in order to handle these errors. 
Otherwise, it would be possible to trigger crashes by properly forging a 
string and passing it into an unprepared C library to let it run into a 
NULL pointer return value of PyUnicode_AS_UNICODE().

Stefan