[Python-Dev] PEP 393 review
Stefan Behnel
stefan_ml at behnel.de
Fri Aug 26 20:28:43 CEST 2011
"Martin v. Löwis", 26.08.2011 18:56:
> I agree with your observation that somebody should be done about error
> handling, and will update the PEP shortly. I propose that
> PyUnicode_Ready should be explicitly called on input where raising an
> exception is feasible. In contexts where it is not feasible (such
> as reading a character, or reading the length or the kind), failing to
> ready the string should cause a fatal error.
I consider this an increase in complexity. It will then no longer be enough
to access the data, the user will first have to figure out a suitable place
in the code to make sure it's actually there, potentially forgetting about
it because it works in all test cases, or potentially triggering a huge
amount of overhead that copies and 'recodes' the string data by executing
one of the macros that does it automatically.
For the specific case of Cython, I would guess that I could just add
another special case that reads the data from the Py_UNICODE buffer and
combines surrogates at need, but that will only work in some cases
(specifically not for indexing). And outside of Cython, most normal user
code won't do that.
My gut feeling leans towards a KISS approach. If you go the route to
require an explicit point for triggering PyUnicode_Ready() calls, why not
just go all the way and make it completely explicit in *all* cases? I.e.
remove all implicit calls from the macros and make it part of the new API
semantics that users *must* call PyUnicode_FAST_READY() before doing
anything with a new string data layout. Much fewer surprises.
Note that there isn't currently an official macro way to figure out that
the flexible string layout has not been initialised yet, i.e. that wstr is
set but str is not. If the implicit PyUnicode_Ready() calls get removed,
PyUnicode_KIND() could take that place by simply returning WSTR_KIND.
That being said, the main problem I currently see is that basically all
existing code needs to be updated in order to handle these errors.
Otherwise, it would be possible to trigger crashes by properly forging a
string and passing it into an unprepared C library to let it run into a
NULL pointer return value of PyUnicode_AS_UNICODE().
Stefan
More information about the Python-Dev
mailing list