[pypy-dev] Speeds of various utf8 operations

Maciej Fijalkowski fijall at gmail.com
Sat Mar 4 14:58:01 EST 2017


Hi phyo

The mail is about during operations in c/assembler. I will have more
detailed python level benchmarks while I progress with my branch.


On 04 Mar 2017 7:36 PM, "Phyo Arkar" <phyo.arkarlwin at gmail.com> wrote:

SSE measn https://en.wikipedia.org/wiki/Streaming_SIMD_Extensions?

in comparison to CPython is this much slower ?

On Sun, Mar 5, 2017 at 12:32 AM Maciej Fijalkowski <fijall at gmail.com> wrote:

> Hello everyone
>
> I've been experimenting a bit with faster utf8 operations (and
> conversion that does not do much). I'm writing down the results so
> they don't get forgotten, as well as trying to put them in rpython
> comments.
>
> As far as non-SSE algorithms go, for things like splitlines, split
> etc. is important to walk the utf8 string quickly and check properties
> of characters.
>
> So far the current finding has been that lookup table, for example:
>
>  def next_codepoint_pos(code, pos):
>      chr1 = ord(code[pos])
>      if chr1 < 0x80:
>          return pos + 1
>     return pos + ord(runicode._utf8_code_length[chr1 - 0x80])
>
> is significantly slower than following code (both don't do error checking):
>
> def next_codepoint_pos(code, pos):
>     chr1 = ord(code[pos])
>     if chr1 < 0x80:
>         return pos + 1
>     if 0xC2 >= chr1 <= 0xDF:
>         return pos + 2
>     if chr >= 0xE0 and chr <= 0xEF:
>         return pos + 3
>     return pos + 4
>
> The exact difference depends on how much multi-byte characters are
> there and how big the strings are. It's up to 40%, but as a general
> rule, the more ascii characters are, the less of an impact it has, as
> well as the larger they are, the more impact memory/L2/L3 cache has.
>
> PS. SSE will be faster still, but we might not want SSE for just splitlines
>
> Cheers,
> fijal
> _______________________________________________
> pypy-dev mailing list
> pypy-dev at python.org
> https://mail.python.org/mailman/listinfo/pypy-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pypy-dev/attachments/20170304/cf55465a/attachment-0001.html>


More information about the pypy-dev mailing list