[New-bugs-announce] [issue35559] Optimize base64.b16decode to use compiled regex

Karthikeyan Singaravelan report at bugs.python.org
Sat Dec 22 02:29:08 EST 2018


New submission from Karthikeyan Singaravelan <tir.karthi at gmail.com>:

I came across this as a result of issue35557 and thought to make a new issue to keep the discussion separate. Currently the b16decode function uses a regex with re.search that can be compiled at the module level as a static variable to give up to 30% improvement when executed on Python 3.7. I am proposing a PR for this change since it looks safe to me.

$ python3 -m perf compare_to default.json optimized.json --table
+--------------------+---------+------------------------------+
| Benchmark          | default | optimized                    |
+====================+=========+==============================+
| b16decode          | 2.97 us | 2.03 us: 1.46x faster (-32%) |
+--------------------+---------+------------------------------+
| b16decode_casefold | 3.18 us | 2.19 us: 1.45x faster (-31%) |
+--------------------+---------+------------------------------+

Benchmark script : 

import perf
import re
import binascii
import base64

_B16DECODE_PAT = re.compile(b'[^0-9A-F]')

def b16decode_re_compiled_search(s, casefold=False):
    s = base64._bytes_from_decode_data(s)
    if casefold:
        s = s.upper()
    if _B16DECODE_PAT.search(s):
        raise binascii.Error('Non-base16 digit found')
    return binascii.unhexlify(s)

if __name__ == "__main__":
    hex_data = "806903d098eb50957b1b376385f233bb3a5d54f54191c8536aefee21fc9ba3ca"
    hex_data_upper = hex_data.upper()

    assert base64.b16decode(hex_data_upper) == b16decode_re_compiled_search(hex_data_upper)
    assert base64.b16decode(hex_data, casefold=True) == b16decode_re_compiled_search(hex_data, casefold=True)

    runner = perf.Runner()
    if True: # toggle to False for default.json
        runner.timeit(name="b16decode",
                      stmt="b16decode_re_compiled_search(hex_data_upper)",
                      setup="from __main__ import b16decode_re_compiled_search, hex_data, hex_data_upper")
        runner.timeit(name="b16decode_casefold",
                      stmt="b16decode_re_compiled_search(hex_data, casefold=True)",
                      setup="from __main__ import b16decode_re_compiled_search, hex_data, hex_data_upper")
    else:
        runner.timeit(name="b16decode",
                      stmt="base64.b16decode(hex_data_upper)",
                      setup="from __main__ import hex_data, hex_data_upper; import base64")
        runner.timeit(name="b16decode_casefold",
                      stmt="base64.b16decode(hex_data, casefold=True)",
                      setup="from __main__ import hex_data, hex_data_upper; import base64")

----------
assignee: xtreak
components: Library (Lib)
messages: 332330
nosy: djhoulihan, serhiy.storchaka, xtreak
priority: normal
severity: normal
status: open
title: Optimize base64.b16decode to use compiled regex
type: performance
versions: Python 3.8

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue35559>
_______________________________________


More information about the New-bugs-announce mailing list