On 8/25/22 12:25, Robert Kern wrote:I don't quite know what this means. My installed version of `jq`, for example, doesn't seem to know what to do with these files.❯ jq --version
❯ jq . eye5chunk_bjd_raw.jdb
parse error: Invalid numeric literal at line 1, column 38
the .jdb files are binary JSON files (specifically BJData) that jq does not currently support; to save as text-based JSON, you change the suffix to .json or .jdt - it results in ~33% increase compared to the binary due to base64
I think a fundamental problem here is that it looks like each element in the array is delimited. I.e. a `float64` value starts with b'D' then the 8 IEEE-754 bytes representing the number. When we're talking about memory-mappability, we are talking about having the on-disk representation being exactly what it looks like in-memory, all of the IEEE-754 floats contiguous with each other, so we can use the `np.memmap` `ndarray` subclass to represent the on-disk data as a first-class array object. This spec lets us mmap the binary JSON file and manipulate its contents in-place efficiently, but that's not what is being asked for here.
there are several BJData-compliant forms to store the same binary array losslessly. The most memory efficient and disk-mmapable (but not necessarily disk-efficient) form is to use the ND-array container syntax that BJData spec extended over UBJSON.