[Distutils] reproducible builds
Robin Becker
robin at reportlab.com
Mon Mar 20 09:02:34 EDT 2017
On 20/03/2017 11:35, Thomas Kluyver wrote:
> On Mon, Mar 20, 2017, at 09:00 AM, Robin Becker wrote:
>> Obviously if I have the ability to embed repr(some_object)
>> into the document output then it will vary (unless the underlying python
>> is reproducible). I'm not sure if debian runs the whole reportlab test
>> suite, but it makes sense to get this kind of variablity out.
>
> AIUI, it's fine to have the *ability* to produce non-deterministic
> output, and it doesn't matter if your tests do that. The aim of
> reproducible builds is to be able to go from the same source code to an
> identical binary package. Documents generated by running the tests are
> presumably not included in binary packages, so it doesn't matter if they
> change.
>
Well now I am confused. The date / times mentioned in the debian patch are those
we force into the documents produced by the reportlab package when it is used.
They would not normally be part of the package itself. Although the reportlab
documentation is available in the source I'm fairly sure we don't include it in
the wheels.
Of course if the debian packaging includes output created by reportlab then that
document would receive the current (ie variable) time. In addition any random
behaviour created by the reportlab generation code would also be embedded in the
document.
If the debian variable is intended create reproducible PDF as part of their
packaging of reportlab or some other package then I'm fairly sure that other
variation will need to be checked in addition to the control that the
SOURCE_DATE_EPOCH variable would give. Perhaps Matthias could comment; I know
little about how the debian packaging works.
>> I believe there was some way to modify the hashing introduced when the dos dictionary attacks were an issue.
>
> The PYTHONHASHSEED environment variable:
> https://docs.python.org/3/using/cmdline.html#envvar-PYTHONHASHSEED
>
> If you have non-determinism introduced by Python hashing, setting a
> constant value of PYTHONHASHSEED should be an easy way to work around
> it.
>
Well years ago we tried to get some random behaviour in text selection by
setting a seed value eg 23......22 (but that doesn't work across pythons). I
guess the algorithm variation across pythons would make dictionary order quite
variable.
> C:\Users\rptlab>\python27\python
> Python 2.7.13 (v2.7.13:a06454b1afa1, Dec 17 2016, 20:53:40) [MSC v.1500 64 bit (AMD64)] on win32
> Type "help", "copyright", "credits" or "license" for more information.
>>>> import random
>>>> random.seed(23......22)
>>>> from random import randint, choice
>>>> randint(10,25)
> 15
>>>>
> C:\Users\rptlab>\python36\python
> Python 3.6.0 (v3.6.0:41df79263a11, Dec 23 2016, 08:06:12) [MSC v.1900 64 bit (AMD64)] on win32
> Type "help", "copyright", "credits" or "license" for more information.
>>>> import random
>>>> random.seed(23......22)
>>>> from random import randint, choice
>>>> randint(10,25)
> 21
>>>>
--
Robin Becker
More information about the Distutils-SIG
mailing list