[Python-Dev] PEP 393 Summer of Code Project

Wed Aug 24 10:17:58 CEST 2011

Le 24/08/2011 04:41, Torsten Becker a écrit :
> On Tue, Aug 23, 2011 at 18:27, Victor Stinner
> <victor.stinner at haypocalc.com>  wrote:
>> I posted a patch to re-add it:
>> http://bugs.python.org/issue12819#msg142867
>
> Thank you for the patch!  Note that this patch adds the fast path only
> to the helper function which determines the length of the string and
> the maximum character.  The decoding part is still without a fast path
> for ASCII runs.

Ah? If utf8_max_char_size_and_has_errors() returns no error hand 
maxchar=127: memcpy() is used. You mean that memcpy() is too slow? :-)

maxchar = utf8_max_char_size_and_has_errors(s, size, &unicode_size,
                                             &has_errors);
if (has_errors) {
   ...
}
else {
    unicode = (PyUnicodeObject *)PyUnicode_New(unicode_size, maxchar);
    if (!unicode) return NULL;
         /* When the string is ASCII only, just use memcpy and return. */
         if (maxchar < 128) {
             assert(unicode_size == size);
             Py_MEMCPY(PyUnicode_1BYTE_DATA(unicode), s, unicode_size);
             return (PyObject *)unicode;
         }
     ...
}

But yes, my patch only optimize ASCII only strings, not "mostly-ASCII" 
strings (e.g. 100 ASCII + 1 latin1 character). It can be optimized 
later. I didn't benchmark my patch.

Victor