[Distutils] reproducible builds

Robin Becker robin at reportlab.com
Mon Mar 20 09:02:34 EDT 2017

On 20/03/2017 11:35, Thomas Kluyver wrote:
> On Mon, Mar 20, 2017, at 09:00 AM, Robin Becker wrote:
>> Obviously if I have the ability to embed  repr(some_object)
>> into the document output then it will vary (unless the underlying python
>> is reproducible). I'm not sure if debian runs the whole reportlab test
>> suite, but it makes sense to get this kind of variablity out.
> AIUI, it's fine to have the *ability* to produce non-deterministic
> output, and it doesn't matter if your tests do that. The aim of
> reproducible builds is to be able to go from the same source code to an
> identical binary package. Documents generated by running the tests are
> presumably not included in binary packages, so it doesn't matter if they
> change.

Well now I am confused. The date / times mentioned in the debian patch are those 
we force into the documents produced by the reportlab package when it is used.

They would not normally be part of the package itself. Although the reportlab 
documentation is available in the source I'm fairly sure we don't include it in 
the wheels.

Of course if the debian packaging includes output created by reportlab then that 
document would receive the current (ie variable) time. In addition any random 
behaviour created by the reportlab generation code would also be embedded in the 

If the debian variable is intended create reproducible PDF as part of their 
packaging of reportlab or some other package then I'm fairly sure that other 
variation will need to be checked in addition to the control that the 
SOURCE_DATE_EPOCH variable would give. Perhaps Matthias could comment; I know 
little about how the debian packaging works.

>>  I believe there was some way to modify the hashing introduced when the dos dictionary attacks were an issue.
> The PYTHONHASHSEED environment variable:
> https://docs.python.org/3/using/cmdline.html#envvar-PYTHONHASHSEED
> If you have non-determinism introduced by Python hashing, setting a
> constant value of PYTHONHASHSEED should be an easy way to work around
> it.

Well years ago we tried to get some random behaviour in text selection by 
setting a seed value eg 23......22 (but that doesn't work across  pythons). I 
guess the algorithm variation across pythons would make dictionary order quite 

> C:\Users\rptlab>\python27\python
> Python 2.7.13 (v2.7.13:a06454b1afa1, Dec 17 2016, 20:53:40) [MSC v.1500 64 bit (AMD64)] on win32
> Type "help", "copyright", "credits" or "license" for more information.
>>>> import random
>>>> random.seed(23......22)
>>>> from random import randint, choice
>>>> randint(10,25)
> 15

> C:\Users\rptlab>\python36\python
> Python 3.6.0 (v3.6.0:41df79263a11, Dec 23 2016, 08:06:12) [MSC v.1900 64 bit (AMD64)] on win32
> Type "help", "copyright", "credits" or "license" for more information.
>>>> import random
>>>> random.seed(23......22)
>>>> from random import randint, choice
>>>> randint(10,25)
> 21

Robin Becker

More information about the Distutils-SIG mailing list