Deterministic builds of the interpreter
Hi, I'm attempting to make the builds of the Python interpreters for Nixpkgs [1] deterministic. In the case of Python 2.7 we have a patch [2] that fixes the timestamp used in .pyc files in case the env var `DETERMINISTIC_BUILD` is set. We also remove `wininst*.exe`. This works fine, although there are 4 small issues left [3]. Do you have any idea what is going on in these files that could make them indeterministic? For Python 3.x I disabled ensurepip, removed `wininst*.exe`, and modified `py_compile` to use `0` instead of `source_stats['mtime']`. The builds are not yet deterministic [4]. Any suggestions what could be fixed here? Kind regards, Freddy [1] https://github.com/NixOS/nixpkgs [2] https://github.com/NixOS/nixpkgs/blob/1da6775/pkgs/development/interpreters/... [3] https://github.com/NixOS/nixpkgs/issues/22570#issuecomment-278474082 [4] https://gist.github.com/anonymous/7cc147af6511dee2dc5a5b8d110f0e6b
As reading [4], mtime is not 0. data = bytearray(MAGIC_NUMBER) data.extend(_w_long(mtime)) data.extend(_w_long(source_size)) data.extend(marshal.dumps(code)) First 4 bytes are magic. Next 4 bytes are mtime. │ │ │ │ -00000000: 160d 0d0a 6b2e 9c58 6c21 0000 e300 0000 ....k..Xl!...... │ │ │ │ +00000000: 160d 0d0a e631 9c58 6c21 0000 e300 0000 .....1.Xl!...... mtime is 6b2e9c58 vs e6319c53 (little endian) maybe, you failed to use customized py_compile when building? On Thu, Feb 9, 2017 at 6:27 PM, Freddy Rietdijk <freddyrietdijk@fridh.nl> wrote:
Hi,
I'm attempting to make the builds of the Python interpreters for Nixpkgs [1] deterministic.
In the case of Python 2.7 we have a patch [2] that fixes the timestamp used in .pyc files in case the env var `DETERMINISTIC_BUILD` is set. We also remove `wininst*.exe`. This works fine, although there are 4 small issues left [3]. Do you have any idea what is going on in these files that could make them indeterministic?
For Python 3.x I disabled ensurepip, removed `wininst*.exe`, and modified `py_compile` to use `0` instead of `source_stats['mtime']`. The builds are not yet deterministic [4]. Any suggestions what could be fixed here?
Kind regards,
Freddy
[1] https://github.com/NixOS/nixpkgs [2] https://github.com/NixOS/nixpkgs/blob/1da6775/pkgs/development/interpreters/... [3] https://github.com/NixOS/nixpkgs/issues/22570#issuecomment-278474082 [4] https://gist.github.com/anonymous/7cc147af6511dee2dc5a5b8d110f0e6b
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/songofacandy%40gmail.com
Correct, that was an older version from before I patched `_bootstrap_external.py`. A more recent diff can be found at https://gist.github.com/anonymous/d40f24fd6b636ba40d345ff3f12a0aaa These all seem to be sets. On Thu, Feb 9, 2017 at 6:04 PM, INADA Naoki <songofacandy@gmail.com> wrote:
As reading [4], mtime is not 0.
data = bytearray(MAGIC_NUMBER) data.extend(_w_long(mtime)) data.extend(_w_long(source_size)) data.extend(marshal.dumps(code))
First 4 bytes are magic. Next 4 bytes are mtime.
│ │ │ │ -00000000: 160d 0d0a 6b2e 9c58 6c21 0000 e300 0000 ....k..Xl!...... │ │ │ │ +00000000: 160d 0d0a e631 9c58 6c21 0000 e300 0000 .....1.Xl!......
mtime is 6b2e9c58 vs e6319c53 (little endian)
maybe, you failed to use customized py_compile when building?
On Thu, Feb 9, 2017 at 6:27 PM, Freddy Rietdijk <freddyrietdijk@fridh.nl> wrote:
Hi,
I'm attempting to make the builds of the Python interpreters for Nixpkgs [1] deterministic.
In the case of Python 2.7 we have a patch [2] that fixes the timestamp used in .pyc files in case the env var `DETERMINISTIC_BUILD` is set. We also remove `wininst*.exe`. This works fine, although there are 4 small issues left [3]. Do you have any idea what is going on in these files that could make them indeterministic?
For Python 3.x I disabled ensurepip, removed `wininst*.exe`, and modified `py_compile` to use `0` instead of `source_stats['mtime']`. The builds are not yet deterministic [4]. Any suggestions what could be fixed here?
Kind regards,
Freddy
[1] https://github.com/NixOS/nixpkgs [2] https://github.com/NixOS/nixpkgs/blob/1da6775/pkgs/ development/interpreters/python/cpython/2.7/deterministic-build.patch [3] https://github.com/NixOS/nixpkgs/issues/22570#issuecomment-278474082 [4] https://gist.github.com/anonymous/7cc147af6511dee2dc5a5b8d110f0e6b
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/ songofacandy%40gmail.com
On Fri, Feb 10, 2017 at 2:45 AM, Freddy Rietdijk <freddyrietdijk@fridh.nl> wrote:
Correct, that was an older version from before I patched `_bootstrap_external.py`. A more recent diff can be found at
https://gist.github.com/anonymous/d40f24fd6b636ba40d345ff3f12a0aaa
These all seem to be sets.
Maybe, PYTHONHASHSEED help you. https://docs.python.org/3/using/cmdline.html#envvar-PYTHONHASHSEED
On Thu, Feb 9, 2017 at 6:04 PM, INADA Naoki <songofacandy@gmail.com> wrote:
As reading [4], mtime is not 0.
data = bytearray(MAGIC_NUMBER) data.extend(_w_long(mtime)) data.extend(_w_long(source_size)) data.extend(marshal.dumps(code))
First 4 bytes are magic. Next 4 bytes are mtime.
│ │ │ │ -00000000: 160d 0d0a 6b2e 9c58 6c21 0000 e300 0000 ....k..Xl!...... │ │ │ │ +00000000: 160d 0d0a e631 9c58 6c21 0000 e300 0000 .....1.Xl!......
mtime is 6b2e9c58 vs e6319c53 (little endian)
maybe, you failed to use customized py_compile when building?
On Thu, Feb 9, 2017 at 6:27 PM, Freddy Rietdijk <freddyrietdijk@fridh.nl> wrote:
Hi,
I'm attempting to make the builds of the Python interpreters for Nixpkgs [1] deterministic.
In the case of Python 2.7 we have a patch [2] that fixes the timestamp used in .pyc files in case the env var `DETERMINISTIC_BUILD` is set. We also remove `wininst*.exe`. This works fine, although there are 4 small issues left [3]. Do you have any idea what is going on in these files that could make them indeterministic?
For Python 3.x I disabled ensurepip, removed `wininst*.exe`, and modified `py_compile` to use `0` instead of `source_stats['mtime']`. The builds are not yet deterministic [4]. Any suggestions what could be fixed here?
Kind regards,
Freddy
[1] https://github.com/NixOS/nixpkgs [2]
https://github.com/NixOS/nixpkgs/blob/1da6775/pkgs/development/interpreters/... [3] https://github.com/NixOS/nixpkgs/issues/22570#issuecomment-278474082 [4] https://gist.github.com/anonymous/7cc147af6511dee2dc5a5b8d110f0e6b
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe:
https://mail.python.org/mailman/options/python-dev/songofacandy%40gmail.com
For Python 3.5 PYTHONHASHSEED doesn't seem to be sufficient, these items still seem indeterministic. To be sure, I ran `PYTHONHASHSEED=1 $out/bin/python -m compileall -f $out` where $out is the path where I installed Python. Do you have an idea why in [3], this is Python 2.7, the timestamps are still incorrect? I think they're all required for `compileall` and somehow it doesn't seem capable of taking into account DETERMINISTIC_BUILD. Explicitly removing those pyc and pyo files and recompiling them to bytecode still results in timestamp issues for these 4 files. On Thu, Feb 9, 2017 at 6:51 PM, INADA Naoki <songofacandy@gmail.com> wrote:
On Fri, Feb 10, 2017 at 2:45 AM, Freddy Rietdijk <freddyrietdijk@fridh.nl> wrote:
Correct, that was an older version from before I patched `_bootstrap_external.py`. A more recent diff can be found at
https://gist.github.com/anonymous/d40f24fd6b636ba40d345ff3f12a0aaa
These all seem to be sets.
Maybe, PYTHONHASHSEED help you. https://docs.python.org/3/using/cmdline.html#envvar-PYTHONHASHSEED
On Thu, Feb 9, 2017 at 6:04 PM, INADA Naoki <songofacandy@gmail.com>
wrote:
As reading [4], mtime is not 0.
data = bytearray(MAGIC_NUMBER) data.extend(_w_long(mtime)) data.extend(_w_long(source_size)) data.extend(marshal.dumps(code))
First 4 bytes are magic. Next 4 bytes are mtime.
│ │ │ │ -00000000: 160d 0d0a 6b2e 9c58 6c21 0000 e300 0000 ....k..Xl!...... │ │ │ │ +00000000: 160d 0d0a e631 9c58 6c21 0000 e300 0000 .....1.Xl!......
mtime is 6b2e9c58 vs e6319c53 (little endian)
maybe, you failed to use customized py_compile when building?
On Thu, Feb 9, 2017 at 6:27 PM, Freddy Rietdijk <
freddyrietdijk@fridh.nl>
wrote:
Hi,
I'm attempting to make the builds of the Python interpreters for Nixpkgs [1] deterministic.
In the case of Python 2.7 we have a patch [2] that fixes the timestamp used in .pyc files in case the env var `DETERMINISTIC_BUILD` is set. We also remove `wininst*.exe`. This works fine, although there are 4 small issues left [3]. Do you have any idea what is going on in these files that could make them indeterministic?
For Python 3.x I disabled ensurepip, removed `wininst*.exe`, and modified `py_compile` to use `0` instead of `source_stats['mtime']`. The builds are not yet deterministic [4]. Any suggestions what could be fixed here?
Kind regards,
Freddy
[1] https://github.com/NixOS/nixpkgs [2]
https://github.com/NixOS/nixpkgs/blob/1da6775/pkgs/ development/interpreters/python/cpython/2.7/deterministic-build.patch [3] https://github.com/NixOS/nixpkgs/issues/22570# issuecomment-278474082 [4] https://gist.github.com/anonymous/7cc147af6511dee2dc5a5b8d110f0e 6b
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ songofacandy%40gmail.com
That should have been `PYTHONHASHSEED=0 $out/bin/python -m compileall -f $out`. On Fri, Feb 10, 2017 at 11:58 AM, Freddy Rietdijk <freddyrietdijk@fridh.nl> wrote:
For Python 3.5 PYTHONHASHSEED doesn't seem to be sufficient, these items still seem indeterministic. To be sure, I ran `PYTHONHASHSEED=1 $out/bin/python -m compileall -f $out` where $out is the path where I installed Python.
Do you have an idea why in [3], this is Python 2.7, the timestamps are still incorrect? I think they're all required for `compileall` and somehow it doesn't seem capable of taking into account DETERMINISTIC_BUILD. Explicitly removing those pyc and pyo files and recompiling them to bytecode still results in timestamp issues for these 4 files.
On Thu, Feb 9, 2017 at 6:51 PM, INADA Naoki <songofacandy@gmail.com> wrote:
On Fri, Feb 10, 2017 at 2:45 AM, Freddy Rietdijk <freddyrietdijk@fridh.nl> wrote:
Correct, that was an older version from before I patched `_bootstrap_external.py`. A more recent diff can be found at
https://gist.github.com/anonymous/d40f24fd6b636ba40d345ff3f12a0aaa
These all seem to be sets.
Maybe, PYTHONHASHSEED help you. https://docs.python.org/3/using/cmdline.html#envvar-PYTHONHASHSEED
On Thu, Feb 9, 2017 at 6:04 PM, INADA Naoki <songofacandy@gmail.com>
wrote:
As reading [4], mtime is not 0.
data = bytearray(MAGIC_NUMBER) data.extend(_w_long(mtime)) data.extend(_w_long(source_size)) data.extend(marshal.dumps(code))
First 4 bytes are magic. Next 4 bytes are mtime.
│ │ │ │ -00000000: 160d 0d0a 6b2e 9c58 6c21 0000 e300 0000 ....k..Xl!...... │ │ │ │ +00000000: 160d 0d0a e631 9c58 6c21 0000 e300 0000 .....1.Xl!......
mtime is 6b2e9c58 vs e6319c53 (little endian)
maybe, you failed to use customized py_compile when building?
On Thu, Feb 9, 2017 at 6:27 PM, Freddy Rietdijk <
freddyrietdijk@fridh.nl>
wrote:
Hi,
I'm attempting to make the builds of the Python interpreters for Nixpkgs [1] deterministic.
In the case of Python 2.7 we have a patch [2] that fixes the timestamp used in .pyc files in case the env var `DETERMINISTIC_BUILD` is set. We also remove `wininst*.exe`. This works fine, although there are 4 small issues left [3]. Do you have any idea what is going on in these files that could make them indeterministic?
For Python 3.x I disabled ensurepip, removed `wininst*.exe`, and modified `py_compile` to use `0` instead of `source_stats['mtime']`. The builds are not yet deterministic [4]. Any suggestions what could be fixed here?
Kind regards,
Freddy
[1] https://github.com/NixOS/nixpkgs [2]
https://github.com/NixOS/nixpkgs/blob/1da6775/pkgs/developme nt/interpreters/python/cpython/2.7/deterministic-build.patch [3] https://github.com/NixOS/nixpkgs/issues/22570#issuecomment- 278474082 [4] https://gist.github.com/anonymous/7cc147af6511dee2dc5a5b8d11 0f0e6b
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe:
https://mail.python.org/mailman/options/python-dev/songofaca ndy%40gmail.com
On Fri, Feb 10, 2017 at 7:58 PM, Freddy Rietdijk <freddyrietdijk@fridh.nl> wrote:
For Python 3.5 PYTHONHASHSEED doesn't seem to be sufficient, these items still seem indeterministic. To be sure, I ran `PYTHONHASHSEED=1 $out/bin/python -m compileall -f $out` where $out is the path where I installed Python.
Do you have an idea why in [3], this is Python 2.7, the timestamps are still incorrect? I think they're all required for `compileall` and somehow it doesn't seem capable of taking into account DETERMINISTIC_BUILD. Explicitly removing those pyc and pyo files and recompiling them to bytecode still results in timestamp issues for these 4 files.
Sorry, I have no motivation about Python 2 anymore.
Hi, Are there anymore suggestions how to improve the determinism of the Python 3 interpreter? As I mentioned, it seems only sets cause unreproducible bytecode. Sets have no order. But when generating the bytecode, I would expect there would still be an order since the code isn't actually executed, right? On Fri, Feb 10, 2017 at 12:03 PM, INADA Naoki <songofacandy@gmail.com> wrote:
On Fri, Feb 10, 2017 at 7:58 PM, Freddy Rietdijk <freddyrietdijk@fridh.nl> wrote:
For Python 3.5 PYTHONHASHSEED doesn't seem to be sufficient, these items still seem indeterministic. To be sure, I ran `PYTHONHASHSEED=1 $out/bin/python -m compileall -f $out` where $out is the path where I installed Python.
Do you have an idea why in [3], this is Python 2.7, the timestamps are still incorrect? I think they're all required for `compileall` and somehow it doesn't seem capable of taking into account DETERMINISTIC_BUILD. Explicitly removing those pyc and pyo files and recompiling them to bytecode still results in timestamp issues for these 4 files.
Sorry, I have no motivation about Python 2 anymore.
Hi Freddy, On 16 February 2017 at 18:03, Freddy Rietdijk <freddyrietdijk@fridh.nl> wrote:
As I mentioned, it seems only sets cause unreproducible bytecode. Sets have no order. But when generating the bytecode, I would expect there would still be an order since the code isn't actually executed, right?
No, the sets are built as real sets and then marshalled to .pyc files in a separate step. So on CPython an essentially random order will end up in the .pyc file. Even CPython 3.6 gives a deterministic order to dictionaries but not sets. You could ensure sets are marshalled in a known order by changing the marshalling code, e.g. to emit them in sorted order (on Python 2.x; on 3.x it is more messy because different types are more often non-comparable). A bientôt, Armin.
Hi Armin, Thank you for your explanation. I've now managed to build 2.7 and 3.5 deterministic by recompiling the bytecode at the end of the build (and excluding 2to3). Freddy On Sun, Feb 19, 2017 at 9:30 AM, Armin Rigo <armin.rigo@gmail.com> wrote:
Hi Freddy,
On 16 February 2017 at 18:03, Freddy Rietdijk <freddyrietdijk@fridh.nl> wrote:
As I mentioned, it seems only sets cause unreproducible bytecode. Sets have no order. But when generating the bytecode, I would expect there would still be an order since the code isn't actually executed, right?
No, the sets are built as real sets and then marshalled to .pyc files in a separate step. So on CPython an essentially random order will end up in the .pyc file. Even CPython 3.6 gives a deterministic order to dictionaries but not sets. You could ensure sets are marshalled in a known order by changing the marshalling code, e.g. to emit them in sorted order (on Python 2.x; on 3.x it is more messy because different types are more often non-comparable).
A bientôt,
Armin.
participants (3)
-
Armin Rigo
-
Freddy Rietdijk
-
INADA Naoki