bad CRC errors when using np.savez, only sometimes though!
I am using 1.19.5 on Windows 10 using Python 3.8.6 (tags/v3.8.6:db45529, Sep 23 2020, 15:52:53) [MSC v.1927 64 bit (AMD64)]. I have two python processes running (i.e. no threads) which do independent processing jobs and NOT writing to the same directories. Each process runs for 5-10 hours and then writes out a ~900MB npz file containing 4 arrays. When I go back to read in the npz files, I will sporadically get bad CRC errors which are related to npz using ziplib. I cannot figure out why this is happening. Looking through online forums, other folks have had CRC problems but they seem to be isolated to specifically using ziblib, not numpy. I have found a few mentions though of ziplib causing headaches if the same file pointer is used across calls when one uses the file handle interface to ziblib as opposed to passing in a filename.' I have verified with 7zip that the files do in fact have a CRC error so its not an artifact of the ziblib. I have also used the file handle interface to np.load and still get the error. Aside from writing my own numpy storage file container, I am stumped as to how to fix this, or reproduce this in a consistent manner. Any suggestions would be greatly appreciated! Thank you, Isaac
Perhaps it is a similar bug as this one? https://github.com/scipy/scipy/issues/6999 Basically, it turned out that the CRC was getting computed on an unflushed buffer, or something like that. On Fri, May 14, 2021 at 10:05 AM Isaac Gerg <isaac.gerg@gergltd.com> wrote:
I am using 1.19.5 on Windows 10 using Python 3.8.6 (tags/v3.8.6:db45529, Sep 23 2020, 15:52:53) [MSC v.1927 64 bit (AMD64)].
I have two python processes running (i.e. no threads) which do independent processing jobs and NOT writing to the same directories. Each process runs for 5-10 hours and then writes out a ~900MB npz file containing 4 arrays.
When I go back to read in the npz files, I will sporadically get bad CRC errors which are related to npz using ziplib. I cannot figure out why this is happening. Looking through online forums, other folks have had CRC problems but they seem to be isolated to specifically using ziblib, not numpy. I have found a few mentions though of ziplib causing headaches if the same file pointer is used across calls when one uses the file handle interface to ziblib as opposed to passing in a filename.'
I have verified with 7zip that the files do in fact have a CRC error so its not an artifact of the ziblib. I have also used the file handle interface to np.load and still get the error.
Aside from writing my own numpy storage file container, I am stumped as to how to fix this, or reproduce this in a consistent manner. Any suggestions would be greatly appreciated!
Thank you, Isaac _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Hi Ben, I am not sure. However, in looking at the dates, it looks like that was fixed in scipy as of 2019. Would you recommend using the scipy save interface as opposed to the numpy one? On Fri, May 14, 2021 at 10:16 AM Benjamin Root <ben.v.root@gmail.com> wrote:
Perhaps it is a similar bug as this one? https://github.com/scipy/scipy/issues/6999
Basically, it turned out that the CRC was getting computed on an unflushed buffer, or something like that.
On Fri, May 14, 2021 at 10:05 AM Isaac Gerg <isaac.gerg@gergltd.com> wrote:
I am using 1.19.5 on Windows 10 using Python 3.8.6 (tags/v3.8.6:db45529, Sep 23 2020, 15:52:53) [MSC v.1927 64 bit (AMD64)].
I have two python processes running (i.e. no threads) which do independent processing jobs and NOT writing to the same directories. Each process runs for 5-10 hours and then writes out a ~900MB npz file containing 4 arrays.
When I go back to read in the npz files, I will sporadically get bad CRC errors which are related to npz using ziplib. I cannot figure out why this is happening. Looking through online forums, other folks have had CRC problems but they seem to be isolated to specifically using ziblib, not numpy. I have found a few mentions though of ziplib causing headaches if the same file pointer is used across calls when one uses the file handle interface to ziblib as opposed to passing in a filename.'
I have verified with 7zip that the files do in fact have a CRC error so its not an artifact of the ziblib. I have also used the file handle interface to np.load and still get the error.
Aside from writing my own numpy storage file container, I am stumped as to how to fix this, or reproduce this in a consistent manner. Any suggestions would be greatly appreciated!
Thank you, Isaac _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Isaac, What I mean is that your bug might be similar to the savemat() bug that was fixed in scipy in 2019. Completely different functions, but both functions need to properly interact with zlib in order to work properly. On Fri, May 14, 2021 at 10:22 AM Isaac Gerg <isaac.gerg@gergltd.com> wrote:
Hi Ben, I am not sure. However, in looking at the dates, it looks like that was fixed in scipy as of 2019.
Would you recommend using the scipy save interface as opposed to the numpy one?
On Fri, May 14, 2021 at 10:16 AM Benjamin Root <ben.v.root@gmail.com> wrote:
Perhaps it is a similar bug as this one? https://github.com/scipy/scipy/issues/6999
Basically, it turned out that the CRC was getting computed on an unflushed buffer, or something like that.
On Fri, May 14, 2021 at 10:05 AM Isaac Gerg <isaac.gerg@gergltd.com> wrote:
I am using 1.19.5 on Windows 10 using Python 3.8.6 (tags/v3.8.6:db45529, Sep 23 2020, 15:52:53) [MSC v.1927 64 bit (AMD64)].
I have two python processes running (i.e. no threads) which do independent processing jobs and NOT writing to the same directories. Each process runs for 5-10 hours and then writes out a ~900MB npz file containing 4 arrays.
When I go back to read in the npz files, I will sporadically get bad CRC errors which are related to npz using ziplib. I cannot figure out why this is happening. Looking through online forums, other folks have had CRC problems but they seem to be isolated to specifically using ziblib, not numpy. I have found a few mentions though of ziplib causing headaches if the same file pointer is used across calls when one uses the file handle interface to ziblib as opposed to passing in a filename.'
I have verified with 7zip that the files do in fact have a CRC error so its not an artifact of the ziblib. I have also used the file handle interface to np.load and still get the error.
Aside from writing my own numpy storage file container, I am stumped as to how to fix this, or reproduce this in a consistent manner. Any suggestions would be greatly appreciated!
Thank you, Isaac _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Is it zlib or zipfile? On Fri, May 14, 2021 at 11:38 AM Benjamin Root <ben.v.root@gmail.com> wrote:
Isaac,
What I mean is that your bug might be similar to the savemat() bug that was fixed in scipy in 2019. Completely different functions, but both functions need to properly interact with zlib in order to work properly.
On Fri, May 14, 2021 at 10:22 AM Isaac Gerg <isaac.gerg@gergltd.com> wrote:
Hi Ben, I am not sure. However, in looking at the dates, it looks like that was fixed in scipy as of 2019.
Would you recommend using the scipy save interface as opposed to the numpy one?
On Fri, May 14, 2021 at 10:16 AM Benjamin Root <ben.v.root@gmail.com> wrote:
Perhaps it is a similar bug as this one? https://github.com/scipy/scipy/issues/6999
Basically, it turned out that the CRC was getting computed on an unflushed buffer, or something like that.
On Fri, May 14, 2021 at 10:05 AM Isaac Gerg <isaac.gerg@gergltd.com> wrote:
I am using 1.19.5 on Windows 10 using Python 3.8.6 (tags/v3.8.6:db45529, Sep 23 2020, 15:52:53) [MSC v.1927 64 bit (AMD64)].
I have two python processes running (i.e. no threads) which do independent processing jobs and NOT writing to the same directories. Each process runs for 5-10 hours and then writes out a ~900MB npz file containing 4 arrays.
When I go back to read in the npz files, I will sporadically get bad CRC errors which are related to npz using ziplib. I cannot figure out why this is happening. Looking through online forums, other folks have had CRC problems but they seem to be isolated to specifically using ziblib, not numpy. I have found a few mentions though of ziplib causing headaches if the same file pointer is used across calls when one uses the file handle interface to ziblib as opposed to passing in a filename.'
I have verified with 7zip that the files do in fact have a CRC error so its not an artifact of the ziblib. I have also used the file handle interface to np.load and still get the error.
Aside from writing my own numpy storage file container, I am stumped as to how to fix this, or reproduce this in a consistent manner. Any suggestions would be greatly appreciated!
Thank you, Isaac _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
participants (3)
-
Benjamin Root
-
Isaac Gerg
-
Kevin Sheppard