On Mon, Apr 24, 2017 at 7:23 PM, Robert Kern <robert.kern@gmail.com> wrote:
On Mon, Apr 24, 2017 at 7:07 PM, Nathaniel Smith <njs@pobox.com> wrote:
That said, AFAICT what people actually want in most use cases is support for arrays that can hold variable-length strings, and the only place where the current approach is *optimal* is when we need mmap compatibility with legacy formats that use fixed-width-nul-padded fields (at which point it's super convenient). It's not even possible to *represent* all Python strings or bytestrings in current numpy unicode or string arrays (Python strings/bytestrings can have trailing nuls). So if we're talking about tweaks to the current system it probably makes sense to focus on this use case specifically.
From context I'm assuming FITS files use fixed-width-nul-padding for strings? Is that right? I know HDF5 doesn't.
Yes, HDF5 does. Or at least, it is supported in addition to the variable-length ones.
https://support.hdfgroup.org/HDF5/doc/Advanced/UsingUnicode/index.html
Doh, I found that page but it was (and is) meaningless to me, so I went by http://docs.h5py.org/en/latest/strings.html, which says the options are fixed-width ascii, variable-length ascii, or variable-length utf-8 ... I guess it's just talking about what h5py currently supports. But also, is it important whether strings we're loading/saving to an HDF5 file have the same in-memory representation in numpy as they would in the file? I *know* [1] no-one is reading HDF5 files using np.memmap :-). Is it important for some other reason? Also, further searching suggests that HDF5 actually supports all of nul termination, nul padding, and space padding, and that nul termination is the default? How much does it help to have in-memory compatibility with just one of these options (and not even the default one)? Would we need to add the other options to be really useful for HDF5? (Unlikely to happen within numpy itself, but potentially something that could be done inside h5py or whatever if numpy's user-defined dtype system were a little more useful.) -n [1] hope -- Nathaniel J. Smith -- https://vorpus.org