[pypy-dev] Speeds of various utf8 operations

Maciej Fijalkowski fijall at gmail.com
Sun Mar 5 14:24:24 EST 2017


This is checking for spaces in unicode (so it's known to be valid utf8)

On Sun, Mar 5, 2017 at 11:14 AM, Armin Rigo <armin.rigo at gmail.com> wrote:
> Hi Maciej,
>
> On 4 March 2017 at 19:01, Maciej Fijalkowski <fijall at gmail.com> wrote:
>> def next_codepoint_pos(code, pos):
>>     chr1 = ord(code[pos])
>>     if chr1 < 0x80:
>>         return pos + 1
>>     if 0xC2 >= chr1 <= 0xDF:
>>         return pos + 2
>>     if chr >= 0xE0 and chr <= 0xEF:
>>         return pos + 3
>>     return pos + 4
>
> If you don't want error checking, then you can simplify a bit the
> range checks here.  Maybe it gives some more gains, but who knows:
>
> def next_codepoint_pos(code, pos):
>     chr1 = ord(code[pos])
>     if chr1 < 0x80:
>         return pos + 1
>     if chr1 <= 0xDF:
>         return pos + 2
>     if chr1 <= 0xEF:
>         return pos + 3
>     return pos + 4
>
>
> A bientôt,
>
> Armin.


More information about the pypy-dev mailing list