[Distutils] reproducible builds

Nick Coghlan ncoghlan at gmail.com
Tue Mar 21 00:21:24 EDT 2017

On 20 March 2017 at 23:34, Thomas Kluyver <thomas at kluyver.me.uk> wrote:

> On Mon, Mar 20, 2017, at 01:02 PM, Robin Becker wrote:
> > I guess the algorithm variation across pythons would make dictionary
> order quite variable.
> For a Python based tool, I think it's reasonable that reproducing a
> build requires running with the same version of Python.
> The requirement would be that, with enough information about the build
> environment, you *can* produce an identical PDF. It needn't (AFAIK) be
> identical every time anyone builds it.

Right, one of the other aspects of reproducible-builds is looking into ways
to define and distribute build environments in addition to the application
source code: https://reproducible-builds.org/docs/definition-strategies/

Within a given binary context (e.g. Debian packages), that may be a text
description, like Debian's buildinfo files:

For Fedora/RHEL/CentOS, the equivalent would probably be to extract a
suitable config from the build system:

In other cases, the build environment may itself by a binary artifact (e.g.
the manylinux1 container images, or the "Holy Build Box" machine images).

Fully eliminating non-determinism usually does requiring switching to
explicit sorting and ordered containers in build tools and scripts, as
otherwise even things like directory listings or JSON serialisation can
introduce variations in output when a build is run on a different machine.
The reproducible-builds project offers some interesting tools to identify
and analyse cases of non-reproducible outputs:

However, nobody can reasonably expect arbitrary upstream projects
(especially volunteer run ones) to be going out and pre-emptively solving
that kind of problem - the most it's realistic to aim for is to encourage
projects to be accommodating when upstream changes are proposed to
introduce more determinism into the build processes for particular
projects, as well as into the artifact generation process for tools that
may be used as part of the build process for other projects. (And I agree
with Thomas that it's likely the latter case that applies for
reportlab-generated PDFs)


P.S. Prompted by Gary Berhnhardt, one of the ways I've started thinking
about the whole question of "built artifacts" in general is as a complex
distributed caching problem, with reproducible builds being a way of
ensuring that it's possible to check the validity of particular cache

Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/distutils-sig/attachments/20170321/01e858b3/attachment.html>

More information about the Distutils-SIG mailing list