[Numpy-discussion] Re: Exporting numpy arrays to binary JSON (BJData) for better portability

Aug. 25, 2022


      ...
the loading time (from an nvme drive, Ubuntu 18.04, python 3.6.9, numpy
1.19.5) for each file is listed below:
0.179s  eye1e4.npy (mmap_mode=None)
0.001s  eye1e4.npy (mmap_mode=r)
0.718s  eye1e4_bjd_raw_ndsyntax.jdb
1.474s  eye1e4_bjd_zlib.jdb
0.635s  eye1e4_bjd_lzma.jdb
clearly, mmapped loading is the fastest option without a surprise; it is
true that the raw bjdata file is about 5x slower than npy loading, but
given the main chunk of the data are stored identically (as contiguous
buffer), I suppose with some optimization of the decoder, the gap between
the two can be substantially shortened. The longer loading time of
zlib/lzma (and similarly saving times) reflects a trade-off between smaller
file sizes and time for compression/decompression/disk-IO.
I think the load time for mmap may be deceptive, it isn't actually loading
anything, just mapping to memory.  Maybe a better benchmark is to actually
process the data, e.g., find the mean which would require reading the
values.

[Numpy-discussion] Re: Exporting numpy arrays to binary JSON (BJData) for better portability

Neal Becker