[pypy-dev] Speeds of various utf8 operations

Maciej Fijalkowski fijall at gmail.com
Sat Mar 4 14:17:22 EST 2017


Er... why would it be slower than cpython?

Anyway, the speeds I'm reporting on are based on C/assembler programs so far.

On Sat, Mar 4, 2017 at 7:36 PM, Phyo Arkar <phyo.arkarlwin at gmail.com> wrote:
> SSE measn https://en.wikipedia.org/wiki/Streaming_SIMD_Extensions?
>
> in comparison to CPython is this much slower ?
>
> On Sun, Mar 5, 2017 at 12:32 AM Maciej Fijalkowski <fijall at gmail.com> wrote:
>>
>> Hello everyone
>>
>> I've been experimenting a bit with faster utf8 operations (and
>> conversion that does not do much). I'm writing down the results so
>> they don't get forgotten, as well as trying to put them in rpython
>> comments.
>>
>> As far as non-SSE algorithms go, for things like splitlines, split
>> etc. is important to walk the utf8 string quickly and check properties
>> of characters.
>>
>> So far the current finding has been that lookup table, for example:
>>
>>  def next_codepoint_pos(code, pos):
>>      chr1 = ord(code[pos])
>>      if chr1 < 0x80:
>>          return pos + 1
>>     return pos + ord(runicode._utf8_code_length[chr1 - 0x80])
>>
>> is significantly slower than following code (both don't do error
>> checking):
>>
>> def next_codepoint_pos(code, pos):
>>     chr1 = ord(code[pos])
>>     if chr1 < 0x80:
>>         return pos + 1
>>     if 0xC2 >= chr1 <= 0xDF:
>>         return pos + 2
>>     if chr >= 0xE0 and chr <= 0xEF:
>>         return pos + 3
>>     return pos + 4
>>
>> The exact difference depends on how much multi-byte characters are
>> there and how big the strings are. It's up to 40%, but as a general
>> rule, the more ascii characters are, the less of an impact it has, as
>> well as the larger they are, the more impact memory/L2/L3 cache has.
>>
>> PS. SSE will be faster still, but we might not want SSE for just
>> splitlines
>>
>> Cheers,
>> fijal
>> _______________________________________________
>> pypy-dev mailing list
>> pypy-dev at python.org
>> https://mail.python.org/mailman/listinfo/pypy-dev


More information about the pypy-dev mailing list