[Numpy-discussion] using loadtxt to load a text file in to a numpy array

Chris Barker chris.barker at noaa.gov
Tue Jan 21 19:30:23 EST 2014


On Tue, Jan 21, 2014 at 3:22 PM, Andrew Collette
<andrew.collette at gmail.com>wrote:

> Just stumbled on this discussion (I'm the lead author of h5py).
>
> We would be overjoyed if there were a 1-byte text type available in
> NumPy.


cool -- it looks like someone is going to get a draft PEP going -- so stay
tuned, and add you comments when there is something to add them too..

 String handling is the source of major pain right now in the
> HDF5 world.  All HDF5 strings are text (opaque types are used for
> binary data), but we're forced into using the "S" type most of the
> time because (1) the "U" type doesn't round-trip between HDF5 and
> NumPy, as there's no fixed-width wide-character string type in HDF5,
>

it looks from here:
http://www.hdfgroup.org/HDF5/doc/ADGuide/WhatsNew180.html

that HDF uses utf-8 for unicode strings -- so you _could_ roundtrip with a
lot of calls to encode/decode -- which could be pretty slow, compared to
other ways to dump numpy arrays into HDF-5 -- that may be waht you mean by
"doesn't round trip".

This may be a good case for a numpy utf-8 dtype, I suppose (or a
arbitrary encoding dtype, anyway).
But: How does hdf handle the fact that utf-8 is not a fixed length encoding?

ASCII-only would be preferable, partly for selfish reasons (HDF5's
> default is ASCII only), and partly to make it possible to copy them
> into containers labelled "UTF-8" without manually inspecting every
> value.
>

hmm -- ascii does have those advantages, but I'm not sure its worth the
restriction on what can be encoded. But you're quite right, you could dump
asciii straight into something expecting utf-8, whereas you could not do
that with latin-1, for instance. But you can't go the other way -- does it
help much to avoided encoding in one direction?

But maybe we can have a any-one-byte-per-char encoding option, in which
case hdfpy could use ascii, but we wouldn't have to everywhere.

-Chris

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140121/319938b2/attachment.html>


More information about the NumPy-Discussion mailing list