[Numpy-discussion] savetxt -> gzip: nondeterministic because of time stamp

Robert Kern robert.kern at gmail.com
Wed Apr 14 18:39:25 EDT 2021


On Wed, Apr 14, 2021 at 6:16 PM Andrew Nelson <andyfaff at gmail.com> wrote:

> On Thu, 15 Apr 2021 at 07:15, Robert Kern <robert.kern at gmail.com> wrote:
>
>> On Wed, Apr 14, 2021 at 4:37 PM Joachim Wuttke <j.wuttke at fz-juelich.de>
>> wrote:
>>
>>> Regarding numpy, I'd propose a bolder measure:
>>> To let savetxt(fname, X, ...) store exactly the same information in
>>> compressed and uncompressed files, always invoke gzip with mtime = 0.
>>>
>>
>> I agree.
>>
>
> I might look into making a PR for this. To be clear what would the desired
> functionality be:
>
> 1. Mandatory to have mtime = 0?
>
> 2. Default mtime = 0, but `np.save*` has an extra `mtime` kwd that allows
> to set the mtime?
>
> 3. Default mtime = time.time(), but `np.save*` has an extra `mtime` kwd
> that allows to set the mtime = 0?
>
>
> As Joachim says for testing/git-related purposes it is nice to have
> bit-wise unchanged files produced (such that the file-hash is unchanged),
> but I can also see that it might be nice to have a modification time when
> files contained in a zip file were last changed (e.g. write with numpy,
> check/inspect with a different module). Of course with the latter you could
> just look at the overall file-write date, they should be the same.
>

I suspect no one's actually looking at the timestamp inside the file
(relevant XKCD comic[1] notwithstanding). I'd lean towards boldness here
and just set mtime=0. If someone screams, then we can add in the option for
them to get the old functionality. But moving closer to reliably
reproducible file formats is a goal worth changing the default behavior.

[1] https://xkcd.com/1172/

-- 
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210414/5e02e4cf/attachment.html>


More information about the NumPy-Discussion mailing list