[Numpy-discussion] using loadtxt to load a text file in to a numpy array

Chris Barker chris.barker at noaa.gov
Wed Jan 15 14:40:58 EST 2014


On Wed, Jan 15, 2014 at 9:57 AM, Charles R Harris <charlesr.harris at gmail.com
> wrote:


> There was a discussion of this long ago and UCS-4 was chosen as the numpy
> standard. There are just too many complications that arise in supporting
> both.
>

fair enough -- but loadtxt appears to be broken just the same. Any
proposals for that?

My proposal:

loadtxt accepts an encoding argument.

default is ascii -- that's what it's doing now, anyway, yes?

If the file is encoded ascii, then a one-byte-per character dtype is used
for text data, unless the user specifies otherwise (do they need to specify
anyway?)

If the file has another encoding, the the default dtype for text is unicode.

Not sure about other one-byte per character encodings (e.g. latin-1)

The defaults may be moot, if the loadtxt doesn't have auto-detection of
text in a filie anyway.

This all required that there be an obvious way for the user to spell the
one-byte-per character dtype -- I think 'S' will do it.

Note to OP: what happens if you specify 'S' for your dtype, rather than str
- it works for me on py2:

In [16]: np.loadtxt('pathlist.txt', dtype='S')
Out[16]:
array(['C:\\Users\\Documents\\Project\\mytextfile1.txt',
       'C:\\Users\\Documents\\Project\\mytextfile2.txt',
       'C:\\Users\\Documents\\Project\\mytextfile3.txt'],
      dtype='|S42')

Note: this leaves us with what to pass back to the user when they index
into an array of type 'S*' -- a bytes object or a unicode object (decoded
as ascii). I think a unicode object, in keeping with proper py3 behavior.
This would be like we currently do with, say floating point numbers:

We can store/operate with 32 bit floats, but when you pass it back as a
python type, you get the native python float -- 64bit.

NOTE: another option is to use latin-1 all around, rather than ascii -- you
may get garbage from the higher value bytes, but it won't barf on you.

-Chris


















> Chuck
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140115/eff8e0db/attachment.html>


More information about the NumPy-Discussion mailing list