[Numpy-discussion] using loadtxt to load a text file in to a numpy array
Julian Taylor
jtaylor.debian at googlemail.com
Wed Jan 15 07:38:57 EST 2014
On 01/15/2014 11:25 AM, Daπid wrote:
> On 15 January 2014 11:12, Hedieh Ebrahimi <hedieh.ebrahimi at amphos21.com
> <mailto:hedieh.ebrahimi at amphos21.com>> wrote:
>
> I try to print my fileContent array after I read it and it looks
> like this :
>
> ["b'C:\\\\Users\\\\Documents\\\\Project\\\\mytextfile1.txt'"
> "b'C:\\\\Users\\\\Documents\\\\Project\\\\mytextfile2.txt'"
> "b'C:\\\\Users\\\\Documents\\\\Project\\\\mytextfile3.txt'"]
>
> Why is this happening and how can I prevent it ?
> Also if I have a line that starts like this in my file, python will
> crash on me. how can i fix this ?
>
>
> What is wrong with this case? If you are concerned about the multiple
> backslashes, they are there because they are special symbols, and so
> they have to be escaped (you actually want a backslash, not whatever
> else they could mean).
>
you have the bytes representation and a duplicate slash in it.
Its due to unicode strings in python3.
A workaround that only works for ascii is:
np.loadtxt(file, dtype=bytes).astype(str)
for non ascii I guess you should use python directly as numpy would also
require a python loop with explicit decoding.
Currently handling strings in python3 with numpy is even worse than
before, you always have to go over bytes and do explicit decodes to get
python strings out of ascii data.
What we might need in numpy is new string xtypes specifying encodings to
allow sane conversion to python3 strings without the excessive memory
usage of 4 byte unicode (ucs-4).
e.g. if its ascii reuse a (which currently maps to bytes)
np.loadtxt(file, dtype='a')
for utf 8 data:
d = np.loadtxt(file, dtype='utf8')
so that type(d[0]) is unicode and not bytes as is currently the case if
you don't want to store your arrays in 4 bytes per character.
More information about the NumPy-Discussion
mailing list