[Numpy-discussion] proposal: smaller representation of string arrays
robert.kern at gmail.com
Thu Apr 20 15:17:48 EDT 2017
On Thu, Apr 20, 2017 at 12:05 PM, Stephan Hoyer <shoyer at gmail.com> wrote:
> On Thu, Apr 20, 2017 at 11:53 AM, Robert Kern <robert.kern at gmail.com>
>> I don't know of a format off-hand that works with numpy uniform-length
strings and Unicode as well. HDF5 (to my recollection) supports arrays of
NULL-terminated, uniform-length ASCII like FITS, but only variable-length
> HDF5 supports two character sets, ASCII and UTF-8. Both come in fixed and
variable length versions:
> "Fixed length UTF-8" for HDF5 refers to the number of bytes used for
storage, not the number of characters.
Ah, okay, I was interpolating from a quick perusal of the h5py docs, which
of course are also constrained by numpy's current set of dtypes. The
NULL-terminated ASCII works well enough with np.string's semantics.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion