[Cython] About IndexNode and unicode[index]

Fri Mar 1 07:46:30 CET 2013

ZS, 01.03.2013 07:43:
> 2013/3/1 Stefan Behnel:
>> ZS, 28.02.2013 21:07:
>>> 2013/2/28 Stefan Behnel:
>>>>> This allows to write unicode text parsing code almost at C speed
>>>>> mostly in python (+ .pxd defintions).
>>>>
>>>> I suggest simply adding a constant flag argument to the existing function
>>>> that states if checking should be done or not. Inlining will let the C
>>>> compiler drop the corresponding code, which may or may nor make it a little
>>>> faster.
>>>
>>> static inline Py_UCS4 unicode_char2(PyObject* ustring, Py_ssize_t i, int flag) {
>>>     Py_ssize_t length;
>>> #if CYTHON_PEP393_ENABLED
>>>     if (PyUnicode_READY(ustring) < 0) return (Py_UCS4)-1;
>>> #endif
>>>     if (flag) {
>>>         length = __Pyx_PyUnicode_GET_LENGTH(ustring);
>>>         if ((0 <= i) & (i < length)) {
>>>             return __Pyx_PyUnicode_READ_CHAR(ustring, i);
>>>         } else if ((-length <= i) & (i < 0)) {
>>>             return __Pyx_PyUnicode_READ_CHAR(ustring, i + length);
>>>         } else {
>>>             PyErr_SetString(PyExc_IndexError, "string index out of range");
>>>             return (Py_UCS4)-1;
>>>         }
>>>     } else {
>>>         return __Pyx_PyUnicode_READ_CHAR(ustring, i);
>>>     }
>>> }
>>
>> I think you could even pass in two flags, one for wraparound and one for
>> boundscheck, and then just evaluate them appropriately in the existing "if"
>> tests above. That should allow both features to be supported independently
>> in a fast way.
>>
> Intresting, could C compilers in optimization mode to eliminate unused
> evaluation path in nested if statements with constant conditional
> expressions?

They'd be worthless if they didn't do that. (Even Cython does it, BTW.)

Stefan