[Numpy-discussion] using loadtxt to load a text file in to a numpy array

josef.pktd at gmail.com josef.pktd at gmail.com
Thu Jan 23 14:49:17 EST 2014


>> > numpy arrays need a decode and encode method
>
>
>> I'm not sure that they do. Rather there needs to be a text dtype that
>> knows what encoding to use in order to have a binary interface as
>> exposed by .tostring() and friends and but produce unicode strings
>> when indexed from Python code. Having both a text and a binary
>> interface to the same data implies having an encoding.
>
>
> I  agree with Oscar here -- let's not conflate encode and decoded data --
> the py3 text model is a fine one, we should work with it as much as
> practical.
>
> UNLESS: if we do add a bytes dtype, then it would be a reasonable use case
> to use it to store encoded text (just like the py3 bytes types), in which
> case it would be good to have encode() and decode() methods or ufuncs --
> probably  ufuncs. But that should be for special purpose, at the I/O
> interface kind of stuff.
>

I think we need both things changing the memory and changing the view.

The same way we can convert between int and float and complex (trunc,
astype, real, ...) we should be able to convert between bytes and any
string (text) dtypes, i.e. decode and encode.

I'm reading a file in binary and then want to convert it to unicode,
only I realize I have only ascii and want to convert to something less
memory hungry.

views don't care about what the content means, it just has to be
memory compatible, I can view anything as an 'S' or a 'uint' (I
think).
What we currently don't have is a string/text view on S that would
interact with python as string.
(that's a vote in favor of a minimal one char string dtype that would
work for a limited number of encodings.)

Josef



More information about the NumPy-Discussion mailing list