[New-bugs-announce] [issue42942] Feature request: Add decdigest() to hashlib

Arnim Rupp report at bugs.python.org
Sat Jan 16 16:33:20 EST 2021


New submission from Arnim Rupp <python at rupp.de>:

Problem: hashlib only offers digest() and hexdigest() but the fastest way to work with hashes is as integer.

The first thing loki does after getting the hashes is to convert them to int:
md5, sha1, sha256 = generateHashes(fileData)
                        md5_num=int(md5, 16)
                        sha1_num=int(sha1, 16)
                        sha256_num=int(sha256, 16)
https://github.com/Neo23x0/Loki/blob/master/loki.py

All the ~50000 hashes to compare are also converted to int after reading them from a file. The comparison is about twice as fast compared to hexdigest in strings because it uses just half the memory. 

(The use case here is to compare these 50,000 hashes to the hashes of all the 200,000 files on a system that gets scanned for malicious files.)

Solution: Add decdigest() to hashlib which returns the int version of the hash. This has 2 advantages: 
1. It saves the time for converting the hash to hex and back
2. Having decdigest() in the documentation inspires more programmers to work with hashes as int opposed to slow strings (where it's performance relevant.)

Should be just few lines of code for each algorithm, I could do the PR.

static PyObject *
_sha3_shake_128_hexdigest(SHA3object *self, PyObject *arg)
{
    PyObject *return_value = NULL;
    unsigned long length;

    if (!_PyLong_UnsignedLong_Converter(arg, &length)) {
        goto exit;
    }
    return_value = _sha3_shake_128_hexdigest_impl(self, length);

https://github.com/python/cpython/blob/63298930fb531ba2bb4f23bc3b915dbf1e17e9e1/Modules/_sha3/clinic/sha3module.c.h

----------
components: Library (Lib)
messages: 385150
nosy: 2d4d
priority: normal
severity: normal
status: open
title: Feature request: Add decdigest() to hashlib
type: performance
versions: Python 3.10

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue42942>
_______________________________________


More information about the New-bugs-announce mailing list