[Cython] About IndexNode and unicode[index]

Thu Feb 28 19:31:28 CET 2013

2013/2/28 ZS <szport at gmail.com>:
> Looking into IndexNode class in ExprNode.py I have seen a possibility
> for addition of more fast code path for unicode[index]  as it done in
> method `generate_setitem_code` in case of lists.
>
> This is files for evaluation of performance difference:
>
> #### unicode_index.h
>
> /* This is striped version of __Pyx_GetItemInt_Unicode_Fast */
> #include "unicodeobject.h"
>
> static inline Py_UCS4 unicode_char(PyObject* ustring, Py_ssize_t i);
>
> static inline Py_UCS4 unicode_char(PyObject* ustring, Py_ssize_t i) {
> #if CYTHON_PEP393_ENABLED
>     if (PyUnicode_READY(ustring) < 0) return (Py_UCS4)-1;
> #endif
>     return __Pyx_PyUnicode_READ_CHAR(ustring, i);
> }
>
> ##### unicode_index.pyx
>
> # coding: utf-8
>
> cdef extern from 'unicode_index.h':
>     inline Py_UCS4 unicode_char(unicode ustring, int i)
>
> cdef unicode text = u"abcdefghigklmnopqrstuvwxyzabcdefghigklmnopqrstuvwxyz"
>
> def f_1(unicode text):
>     cdef int i, j
>     cdef int n = len(text)
>     cdef Py_UCS4 ch
>
>     for j from 0<=j<=1000000:
>         for i from 0<=i<=n-1:
>             ch = text[i]
>
> def f_2(unicode text):
>     cdef int i, j
>     cdef int n = len(text)
>     cdef Py_UCS4 ch
>
>     for j from 0<=j<=1000000:
>         for i from 0<=i<=n-1:
>             ch = unicode_char(text, i)
>
> def test_1():
>     f_1(text)
>
> def test_2():
>     f_2(text)
>
> Timing results:
>
> (py33) zbook:mytests $ python3.3 -m timeit -n 100 -r 10 -s "from
> mytests.unicode_index import test_1" "test_1()"
> 100 loops, best of 10: 89 msec per loop
> (py33) zbook:mytests $ python3.3 -m timeit -n 100 -r 10 -s "from
> mytests.unicode_index import test_2" "test_2()"
> 100 loops, best of 10: 46.1 msec per loop
>
> in setup.py globally:
>
>        "boundscheck": False
>        "wraparound": False
>        "nonecheck": False
>
For the sake of clarity I would like to add the following... This
optimization is for the case when both `boundscheck(False)` and
`wraparound(False)` is applied. Otherwise default path of evaluation
(__Pyx_GetItemInt_Unicode) is applied.

This allows to write unicode text parsing code almost at C speed
mostly in python (+ .pxd defintions).

 Zaur Shibzukhov