[Numpy-discussion] using loadtxt to load a text file in to a numpy array
chris.barker at noaa.gov
Fri Jan 17 15:30:06 EST 2014
On Fri, Jan 17, 2014 at 5:18 AM, Freddie Witherden <freddie at witherden.org>wrote:
> In terms of HDF5 it is interesting to look at how h5py -- which has to
> go between NumPy types and HDF5 conventions -- handles the problem as
> described here:
"""All strings in HDF5 hold encoded text.
You can’t store arbitrary binary data in HDF5 strings.
This is actually the same as a py3 string (though the mechanism may be
completely different), and the problem with numpy's 'S' - is it text or
bytes? Given the name and history, it should be text, but apparently people
have been using t for bytes, so we have to keep that meaning/use case. But
I suggest, that like Python3 -- we official declare that you should not
consider it text, and not do any implicite conversions.
Which means we could use a one-byte-per-character text dtype.
"""At the high-level interface, h5py exposes three kinds of strings. Each
maps to a specific type within Python (but see str_py3 below):
Fixed-length ASCII (NumPy S type)
This is wrong, or mis-guided, or maybe only a little confusing -- 'S' is
not an ASCII string (even though I wish it were...). But clearly the HDF
folsk think we need one!
These are created when you use numpy.string_:
>>> dset.attrs["name"] = numpy.string_("Hello")
or the S dtype:
>>> dset = f.create_dataset("string_ds", (100,), dtype="S10")
Pardon my py3 ignorance -- is numpy.string_ the same as 'S' in py3?
Form another post, I thought you'd need to use numpy.bytes_ (which is the
same on py2)
These are created when you assign a byte string to an attribute:
>>> dset.attrs["attr"] = b"Hello"
or when you create a dataset with an explicit “bytes” vlen type:
>>> dt = h5py.special_dtype(vlen=bytes)
>>> dset = f.create_dataset("name", (100,), dtype=dt)
Note that they’re not fully identical to Python byte strings.
This implies that HDF would be well served by an ascii text type.
What about NumPy’s U type?
NumPy also has a Unicode type, a UTF-32 fixed-width format (4-byte
characters). HDF5 has no support for wide characters. Rather than trying to
hack around this and “pretend” to support it, h5py will raise an error when
attempting to create datasets or attributes of this type.
Interesting, though I think irrelevant to this conversation but it would
be nice if HDFpy would encode/decode to/from utf-8 for these.
> which IMHO got it about right.
> Regards, Freddie.
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
Christopher Barker, Ph.D.
Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion