On Fri, 8 Sep 2017 12:04:52 +0200 Antoine Pitrou <solipsis@pitrou.net> wrote:
On Thu, 7 Sep 2017 18:47:20 -0700 Nick Coghlan <ncoghlan@gmail.com> wrote:
However, I do wonder whether we could encode *all* the mode settings into the magic number, such that we did something like reserving the top 3 bits for format flags:
* number & 0x1FFF -> the traditional magic number * number & 0x8000 -> timestamp or hash? * number & 0x4000 -> checked or not? * number & 0x2000 -> reserved for future format changes
I'd rather a single magic number and a separate bitfield that tells what the header encodes exactly. We don't *have* to fight for a tiny size reduction of pyc files.
Let me expand a bit on this. Currently, the format is: - bytes 0..3: magic number - bytes 4..7: source file timestamp - bytes 8..11: source file size - bytes 12+: pyc file body (marshal format) What I'm proposing is: - bytes 0..3: magic number - bytes 4..7: header options (bitfield) - bytes 8..15: header contents Depending on header options: - bytes 8..11: source file timestamp - bytes 12..15: source file size or: - bytes 8..15: 64-bit source file hash - bytes 16+: pyc file body (marshal format) This way, we keep a single magic number, a single header size, and there's only a per-build variation in the middle of the header. Of course, there are possible ways to encode information. For example, the header could be a sequence of Type-Length-Value triplets, perhaps prefixed with header size or body offset for easy seeking. My whole point here is that we can easily avoid the annoyance of dual magic numbers and encodings which must be maintained in parallel. Regards Antoine.