
Hi, I'm going to give a seminar about serialization, and I'd like to describe the .npy format. I noticed that there is a variant of it called .npz that can pack several arrays in one single file. However, .npz does not use compression at all and I'm wondering what's the reason. I suppose that this is because you don't want to loose the possibility to memmap saved arrays, but can someone confirm this? Thanks, -- Francesc Alted

On Wed, Sep 29, 2010 at 03:17, Francesc Alted <faltet@pytables.org> wrote:
Hi,
I'm going to give a seminar about serialization, and I'd like to describe the .npy format. I noticed that there is a variant of it called .npz that can pack several arrays in one single file.
However, .npz does not use compression at all and I'm wondering what's the reason. I suppose that this is because you don't want to loose the possibility to memmap saved arrays, but can someone confirm this?
While I suspect it's possible, I'm certain we don't have any code that actually does it. Most likely the author assumed that it would be faster (or tested it to be faster with their CPU/hard disk configuration) to not compress. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

A Thursday 30 September 2010 18:20:16 Robert Kern escrigué:
On Wed, Sep 29, 2010 at 03:17, Francesc Alted <faltet@pytables.org> wrote:
Hi,
I'm going to give a seminar about serialization, and I'd like to describe the .npy format. I noticed that there is a variant of it called .npz that can pack several arrays in one single file.
However, .npz does not use compression at all and I'm wondering what's the reason. I suppose that this is because you don't want to loose the possibility to memmap saved arrays, but can someone confirm this?
While I suspect it's possible, I'm certain we don't have any code that actually does it. Most likely the author assumed that it would be faster (or tested it to be faster with their CPU/hard disk configuration) to not compress.
Thanks, that's good to know. And yes, I'd say that compressing with zip (zlib) would reduce performance for doing I/O, but most probably decompressing from disk media would represent an improvement in terms of time. At any rate, adding compression capability to .npy should be just one parameter away, so perhaps is a good idea adding it. -- Francesc Alted

On Fri, Oct 1, 2010 at 02:13, Francesc Alted <faltet@pytables.org> wrote:
A Thursday 30 September 2010 18:20:16 Robert Kern escrigué:
On Wed, Sep 29, 2010 at 03:17, Francesc Alted <faltet@pytables.org> wrote:
Hi,
I'm going to give a seminar about serialization, and I'd like to describe the .npy format. I noticed that there is a variant of it called .npz that can pack several arrays in one single file.
However, .npz does not use compression at all and I'm wondering what's the reason. I suppose that this is because you don't want to loose the possibility to memmap saved arrays, but can someone confirm this?
While I suspect it's possible, I'm certain we don't have any code that actually does it. Most likely the author assumed that it would be faster (or tested it to be faster with their CPU/hard disk configuration) to not compress.
Thanks, that's good to know. And yes, I'd say that compressing with zip (zlib) would reduce performance for doing I/O, but most probably decompressing from disk media would represent an improvement in terms of time. At any rate, adding compression capability to .npy should be just one parameter away, so perhaps is a good idea adding it.
Also some design, documentation, format version bump, and (not least) code away. ;-) -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

2010/10/2 Robert Kern <robert.kern@gmail.com>
On Fri, Oct 1, 2010 at 02:13, Francesc Alted <faltet@pytables.org> wrote:
A Thursday 30 September 2010 18:20:16 Robert Kern escrigué:
On Wed, Sep 29, 2010 at 03:17, Francesc Alted <faltet@pytables.org> wrote:
Hi,
I'm going to give a seminar about serialization, and I'd like to describe the .npy format. I noticed that there is a variant of it called .npz that can pack several arrays in one single file.
However, .npz does not use compression at all and I'm wondering what's the reason. I suppose that this is because you don't want to loose the possibility to memmap saved arrays, but can someone confirm this?
While I suspect it's possible, I'm certain we don't have any code that actually does it. Most likely the author assumed that it would be faster (or tested it to be faster with their CPU/hard disk configuration) to not compress.
Thanks, that's good to know. And yes, I'd say that compressing with zip (zlib) would reduce performance for doing I/O, but most probably decompressing from disk media would represent an improvement in terms of time. At any rate, adding compression capability to .npy should be just one parameter away, so perhaps is a good idea adding it.
Also some design, documentation, format version bump, and (not least) code away. ;-)
Oh, indeed :-) -- Francesc Alted

On 2010-10-01, at 7:22 PM, Robert Kern wrote:
Also some design, documentation, format version bump, and (not least) code away. ;-)
Would it require a format version number bump? I thought that was a .NPY thing, and NPZs were just zipfiles containing several separate NPY containers. David

On Sat, Oct 2, 2010 at 22:10, David Warde-Farley <wardefar@iro.umontreal.ca> wrote:
On 2010-10-01, at 7:22 PM, Robert Kern wrote:
Also some design, documentation, format version bump, and (not least) code away. ;-)
Would it require a format version number bump? I thought that was a .NPY thing, and NPZs were just zipfiles containing several separate NPY containers.
Perhaps that's what Francesc was intending to say, but he wrote "At any rate, adding compression capability to .npy should be just one parameter away, so perhaps is a good idea adding it." So that's what I was responding to. Yes, adding regular ZIP compression to .npz files should be just one parameter away without much thought. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
participants (3)
-
David Warde-Farley
-
Francesc Alted
-
Robert Kern