[New-bugs-announce] [issue35559] Optimize base64.b16decode to use compiled regex
Karthikeyan Singaravelan
report at bugs.python.org
Sat Dec 22 02:29:08 EST 2018
New submission from Karthikeyan Singaravelan <tir.karthi at gmail.com>:
I came across this as a result of issue35557 and thought to make a new issue to keep the discussion separate. Currently the b16decode function uses a regex with re.search that can be compiled at the module level as a static variable to give up to 30% improvement when executed on Python 3.7. I am proposing a PR for this change since it looks safe to me.
$ python3 -m perf compare_to default.json optimized.json --table
+--------------------+---------+------------------------------+
| Benchmark | default | optimized |
+====================+=========+==============================+
| b16decode | 2.97 us | 2.03 us: 1.46x faster (-32%) |
+--------------------+---------+------------------------------+
| b16decode_casefold | 3.18 us | 2.19 us: 1.45x faster (-31%) |
+--------------------+---------+------------------------------+
Benchmark script :
import perf
import re
import binascii
import base64
_B16DECODE_PAT = re.compile(b'[^0-9A-F]')
def b16decode_re_compiled_search(s, casefold=False):
s = base64._bytes_from_decode_data(s)
if casefold:
s = s.upper()
if _B16DECODE_PAT.search(s):
raise binascii.Error('Non-base16 digit found')
return binascii.unhexlify(s)
if __name__ == "__main__":
hex_data = "806903d098eb50957b1b376385f233bb3a5d54f54191c8536aefee21fc9ba3ca"
hex_data_upper = hex_data.upper()
assert base64.b16decode(hex_data_upper) == b16decode_re_compiled_search(hex_data_upper)
assert base64.b16decode(hex_data, casefold=True) == b16decode_re_compiled_search(hex_data, casefold=True)
runner = perf.Runner()
if True: # toggle to False for default.json
runner.timeit(name="b16decode",
stmt="b16decode_re_compiled_search(hex_data_upper)",
setup="from __main__ import b16decode_re_compiled_search, hex_data, hex_data_upper")
runner.timeit(name="b16decode_casefold",
stmt="b16decode_re_compiled_search(hex_data, casefold=True)",
setup="from __main__ import b16decode_re_compiled_search, hex_data, hex_data_upper")
else:
runner.timeit(name="b16decode",
stmt="base64.b16decode(hex_data_upper)",
setup="from __main__ import hex_data, hex_data_upper; import base64")
runner.timeit(name="b16decode_casefold",
stmt="base64.b16decode(hex_data, casefold=True)",
setup="from __main__ import hex_data, hex_data_upper; import base64")
----------
assignee: xtreak
components: Library (Lib)
messages: 332330
nosy: djhoulihan, serhiy.storchaka, xtreak
priority: normal
severity: normal
status: open
title: Optimize base64.b16decode to use compiled regex
type: performance
versions: Python 3.8
_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue35559>
_______________________________________
More information about the New-bugs-announce
mailing list