[issue22341] Python 3 crc32 documentation clarifications

New submission from Martin Panter: This is regarding the Python 3 documentation for binascii.crc32(), <https://docs.python.org/dev/library/binascii.html#binascii.crc32>. It repeatedly recommends correcting the sign by doing crc32() & 0xFFFFFFFF, but it is not immediately clear why. Only after reading the Python 2 documentation does one realise that the value is always unsigned for Python 3, so you only really need the workaround if you want to support earlier Pythons. I also suggest documenting the initial CRC input value, which is zero. Suggested wording: binascii.crc32(data[, crc]) Compute CRC-32, the 32-bit checksum of “data”, starting with the given “crc”. The default initial value is zero. The algorithm is consistent with the ZIP file checksum. Since the algorithm is designed for use as a checksum algorithm, it is not suitable for use as a general hash algorithm. Use as follows: print(binascii.crc32(b"hello world")) # Or, in two pieces: crc = binascii.crc32(b"hello", 0) crc = binascii.crc32(b" world", crc) print('crc32 = {:#010x}'.format(crc)) I would simply drop the notice box with the workaround, because I gather that the Python 3 documentation generally omits Python 2 details. (There are no “new in version 2.4 tags” for instance.) Otherwise, clarify if “packed binary format” is a reference to the “struct” module, or something else. Similar fixes are probably appropriate for zlib.crc32() and zlib.alder32(). Also, what is the relationship between binascii.crc32() and zlib.crc32()? I vaguely remember reading that “zlib” is not always available, so I tend to use “binascii” instead. Is there any advantage in using the “zlib” version? The “hashlib” documentation points to “zlib” without mentioning “binascii” at all. ---------- assignee: docs@python components: Documentation messages: 226419 nosy: docs@python, vadmium priority: normal severity: normal status: open title: Python 3 crc32 documentation clarifications versions: Python 3.4 _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue22341> _______________________________________

Martin Panter added the comment: Here is a patch that fixes the binascii, zlib.crc32() and adler32() documentation as I suggested. I’m still interested why there are two ways to do a CRC-32, each equally non-obvious as the other. ---------- keywords: +patch Added file: http://bugs.python.org/file37504/crc-sign.patch _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue22341> _______________________________________

Serhiy Storchaka added the comment: crc & 0xffffffff is still used in gzip, zipfile and tarfile. And some comments say about signess of 32-bit checksums. ---------- nosy: +serhiy.storchaka _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue22341> _______________________________________

Martin Panter added the comment: Posting a new patch that also removes the masking from the gzip, zipfile, tarfile, and test_zlib modules. I removed the comment about signedness in tarfile; let me know if you saw any others. ---------- versions: +Python 3.5 -Python 3.4 Added file: http://bugs.python.org/file38398/crc-sign.v2.patch _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue22341> _______________________________________

Serhiy Storchaka added the comment: These notes was added by Gregory in r68535. Ask him if they still are needed. ---------- nosy: +gregory.p.smith _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue22341> _______________________________________

Serhiy Storchaka added the comment: See also issue4903. ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue22341> _______________________________________

Gregory P. Smith added the comment: I do not object to the removal of the & 0xfffffff from the stdlib library code if these functions have actually been fixed to always return unsigned now. (double check the behavior, and if good, just do it!) But I think the docs should still mention that the & 0xffffffff is a good practice if code needs to be compatible with Python versions prior to X.Y (list the last release before the behavior was corrected). Possibly within a .. versionchanged: section. People often use the latest docs when writing code in any version of Python as we continually improve docs and are pretty good about noting old behaviors and when behaviors changed. ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue22341> _______________________________________

Martin Panter added the comment: Hi Gregory, I think the three functions have been fixed since Python 3.0. It looks like you changed them in revisions 6a7fa8421902 (zlib module) and 132cf3a79126 (crc32 functions). Looking at the 3.0 branch, all three functions use PyLong_FromUnsignedLong(), but in the 2.7 branch, they use PyInt_FromLong(). This is tested in test_zlib.ChecksumTestCase.test_crc32_adler32_unsigned(). See also my changes to test_penguins() in the patch here for further testing. There is no explicit testing of binascii.crc32() unsignedness, but it is implicitly tested by test_same_as_binascii_crc32(), because the expected CRC is beyond the positive 32 bit signed limit. I am happy to put back the suggestion of masking for backwards compatibility if you really think it is necessary. But since it only involves Python < 3.0 compatibility, I thought it would be okay to remove it; see my original post. The documentation is often pretty good about noting when Python 3 behaviours changed, but you usually have to look elsewhere to see the differences from Python 2. ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue22341> _______________________________________

Martin Panter added the comment: Patch v3: * Reverted to original crc32(b"hello") example call with the implicit initial CRC * Added “Changed in version 3.0” notices, restoring a brief version of the suggestion to use the bit mask, along with an explanation. Python 2 compatibility information is generally unprecedented in the Python 3 documentation though, but hopefully this version should make more sense to people not already familiar with the Python 2 odd behaviour. ---------- Added file: http://bugs.python.org/file38737/crc-sign.v3.patch _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue22341> _______________________________________

Martin Panter added the comment: V4 fixes a merge conflict with recent gzip changes. ---------- Added file: http://bugs.python.org/file38739/crc-sign.v4.patch _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue22341> _______________________________________

Serhiy Storchaka added the comment: The patch LGTM. But while we are here, wouldn't be worth to unify the name of second parameter of binascii and zlib CRC calculation functions? And unify the description of crc_hqx() with other functions. ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue22341> _______________________________________

Martin Panter added the comment: In crc-sign.v5.patch I have changed the binascii.crc_hqx() and crc32() documentation to use “value”, matching the zlib documentation. Is that what you had in mind? ---------- stage: -> patch review versions: +Python 3.6 Added file: http://bugs.python.org/file41271/crc-sign.v5.patch _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue22341> _______________________________________

Serhiy Storchaka added the comment: Yes, thank you Martin. ---------- assignee: docs@python -> martin.panter stage: patch review -> commit review _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue22341> _______________________________________

Roundup Robot added the comment: New changeset 03c4ca1a34e2 by Martin Panter in branch '3.5': Issue #22341: Drop Python 2 workaround and document CRC initial value https://hg.python.org/cpython/rev/03c4ca1a34e2 New changeset 360c1326d8d1 by Martin Panter in branch 'default': Issue #22341: Merge CRC doc from 3.5 https://hg.python.org/cpython/rev/360c1326d8d1 ---------- nosy: +python-dev _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue22341> _______________________________________

Martin Panter added the comment: Thanks for the reviews ---------- resolution: -> fixed stage: commit review -> resolved status: open -> closed _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue22341> _______________________________________
participants (4)
-
Gregory P. Smith
-
Martin Panter
-
Roundup Robot
-
Serhiy Storchaka