bug in genfromtxt for python 3.2
numpy/lib/test_io.py only uses StringIO in the test, no actual csv file If I give the filename than I get a TypeError: Can't convert 'bytes' object to str implicitly from the statsmodels mailing list example
data = recfromtxt(open('./star98.csv', "U"), delimiter=",", skip_header=1, dtype=float) Traceback (most recent call last): File "
", line 1, in <module> data = recfromtxt(open('./star98.csv', "U"), delimiter=",", skip_header=1, dtype=float) File "C:\Programs\Python32\lib\site-packages\numpy\lib\npyio.py", line 1633, in recfromtxt output = genfromtxt(fname, **kwargs) File "C:\Programs\Python32\lib\site-packages\numpy\lib\npyio.py", line 1181, in genfromtxt first_values = split_line(first_line) File "C:\Programs\Python32\lib\site-packages\numpy\lib\_iotools.py", line 206, in _delimited_splitter line = line.split(self.comments)[0].strip(asbytes(" \r\n")) TypeError: Can't convert 'bytes' object to str implicitly
line 1184 in npyio (py32 sourcefile)
if isinstance(fname, str): fhd = np.lib._datasource.open(fname, 'U')
seems to be the culprit for my case
changing to binary solved this problem for me fhd = np.lib._datasource.open(fname, 'Ub') (I still have other errors but don't know yet where they are coming from.) Almost all problem with porting statsmodels to python 3.2 so far are input related, mainly reading csv files which are heavily used in the tests. All the "real" code seems to work fine with numpy and scipy (and matplotlib so far) for python 3.2 Josef
Hi,
On Mon, Mar 28, 2011 at 11:29 PM,
numpy/lib/test_io.py only uses StringIO in the test, no actual csv file
If I give the filename than I get a TypeError: Can't convert 'bytes' object to str implicitly
from the statsmodels mailing list example
data = recfromtxt(open('./star98.csv', "U"), delimiter=",", skip_header=1, dtype=float) Traceback (most recent call last): File "
", line 1, in <module> data = recfromtxt(open('./star98.csv', "U"), delimiter=",", skip_header=1, dtype=float) File "C:\Programs\Python32\lib\site-packages\numpy\lib\npyio.py", line 1633, in recfromtxt output = genfromtxt(fname, **kwargs) File "C:\Programs\Python32\lib\site-packages\numpy\lib\npyio.py", line 1181, in genfromtxt first_values = split_line(first_line) File "C:\Programs\Python32\lib\site-packages\numpy\lib\_iotools.py", line 206, in _delimited_splitter line = line.split(self.comments)[0].strip(asbytes(" \r\n")) TypeError: Can't convert 'bytes' object to str implicitly
Is the right fix for this to open a 'filename' passed to genfromtxt, as 'binary' (bytes)? If so I will submit a pull request with a fix and a test, Best, Matthew
On Wed, Mar 30, 2011 at 3:39 AM, Matthew Brett
Hi,
On Mon, Mar 28, 2011 at 11:29 PM,
wrote: numpy/lib/test_io.py only uses StringIO in the test, no actual csv file
If I give the filename than I get a TypeError: Can't convert 'bytes' object to str implicitly
from the statsmodels mailing list example
data = recfromtxt(open('./star98.csv', "U"), delimiter=",", skip_header=1, dtype=float) Traceback (most recent call last): File "
", line 1, in <module> data = recfromtxt(open('./star98.csv', "U"), delimiter=",", skip_header=1, dtype=float) File "C:\Programs\Python32\lib\site-packages\numpy\lib\npyio.py", line 1633, in recfromtxt output = genfromtxt(fname, **kwargs) File "C:\Programs\Python32\lib\site-packages\numpy\lib\npyio.py", line 1181, in genfromtxt first_values = split_line(first_line) File "C:\Programs\Python32\lib\site-packages\numpy\lib\_iotools.py", line 206, in _delimited_splitter line = line.split(self.comments)[0].strip(asbytes(" \r\n")) TypeError: Can't convert 'bytes' object to str implicitly Is the right fix for this to open a 'filename' passed to genfromtxt, as 'binary' (bytes)?
If so I will submit a pull request with a fix and a test,
Seems to work and is what was intended I think, see Pauli's changes/notes in commit 0f2e7db0. This is ticket #1607 by the way. Cheers, Ralf
Hi,
On Wed, Mar 30, 2011 at 10:02 AM, Ralf Gommers
On Wed, Mar 30, 2011 at 3:39 AM, Matthew Brett
wrote: Hi,
On Mon, Mar 28, 2011 at 11:29 PM,
wrote: numpy/lib/test_io.py only uses StringIO in the test, no actual csv file
If I give the filename than I get a TypeError: Can't convert 'bytes' object to str implicitly
from the statsmodels mailing list example
> data = recfromtxt(open('./star98.csv', "U"), delimiter=",", skip_header=1, dtype=float) Traceback (most recent call last): File "
", line 1, in <module> data = recfromtxt(open('./star98.csv', "U"), delimiter=",", skip_header=1, dtype=float) File "C:\Programs\Python32\lib\site-packages\numpy\lib\npyio.py", line 1633, in recfromtxt output = genfromtxt(fname, **kwargs) File "C:\Programs\Python32\lib\site-packages\numpy\lib\npyio.py", line 1181, in genfromtxt first_values = split_line(first_line) File "C:\Programs\Python32\lib\site-packages\numpy\lib\_iotools.py", line 206, in _delimited_splitter line = line.split(self.comments)[0].strip(asbytes(" \r\n")) TypeError: Can't convert 'bytes' object to str implicitly Is the right fix for this to open a 'filename' passed to genfromtxt, as 'binary' (bytes)?
If so I will submit a pull request with a fix and a test,
Seems to work and is what was intended I think, see Pauli's changes/notes in commit 0f2e7db0.
This is ticket #1607 by the way.
Thanks for making a ticket. I've submitted a pull request for the fix and linked to it from the ticket. The reason I asked whether this was the correct fix was: imagine I'm working with a non-latin default encoding, and I've opened a file: fobj = open('my_nonlatin.txt', 'rt') in python 3.2. That might contain numbers and non-latin text. I can't pass that into 'genfromtxt' because it will give me this error above. I can pass it is as binary but then I'll get garbled text. Should those functions also allow unicode-providing files (perhaps with binary as default for speed)? See you, Matthew
On Wed, Mar 30, 2011 at 7:37 PM, Matthew Brett
Hi,
On Wed, Mar 30, 2011 at 10:02 AM, Ralf Gommers
wrote: On Wed, Mar 30, 2011 at 3:39 AM, Matthew Brett
wrote: Hi,
On Mon, Mar 28, 2011 at 11:29 PM,
wrote: numpy/lib/test_io.py only uses StringIO in the test, no actual csv file
If I give the filename than I get a TypeError: Can't convert 'bytes' object to str implicitly
from the statsmodels mailing list example
>> data = recfromtxt(open('./star98.csv', "U"), delimiter=",", skip_header=1, dtype=float) Traceback (most recent call last): File "
", line 1, in <module> data = recfromtxt(open('./star98.csv', "U"), delimiter=",", skip_header=1, dtype=float) File "C:\Programs\Python32\lib\site-packages\numpy\lib\npyio.py", line 1633, in recfromtxt output = genfromtxt(fname, **kwargs) File "C:\Programs\Python32\lib\site-packages\numpy\lib\npyio.py", line 1181, in genfromtxt first_values = split_line(first_line) File "C:\Programs\Python32\lib\site-packages\numpy\lib\_iotools.py", line 206, in _delimited_splitter line = line.split(self.comments)[0].strip(asbytes(" \r\n")) TypeError: Can't convert 'bytes' object to str implicitly Is the right fix for this to open a 'filename' passed to genfromtxt, as 'binary' (bytes)?
If so I will submit a pull request with a fix and a test,
Seems to work and is what was intended I think, see Pauli's changes/notes in commit 0f2e7db0.
This is ticket #1607 by the way.
Thanks for making a ticket. I've submitted a pull request for the fix and linked to it from the ticket.
The reason I asked whether this was the correct fix was:
imagine I'm working with a non-latin default encoding, and I've opened a file:
fobj = open('my_nonlatin.txt', 'rt')
in python 3.2. That might contain numbers and non-latin text. I can't pass that into 'genfromtxt' because it will give me this error above. I can pass it is as binary but then I'll get garbled text.
I admit the string/bytes thing is still a little confusing to me, but isn't that always going to be a problem (even with python 2.x)? There's no way for genfromtxt to know what the encoding of an arbitrary file is. So your choices are garbled text or an error. Garbled text is better. It may help to explicitly say in the docstring that this is an ASCII routine (as it does in the source code). Ralf
Should those functions also allow unicode-providing files (perhaps with binary as default for speed)?
On Wed, 30 Mar 2011 10:37:45 -0700, Matthew Brett wrote: [clip]
imagine I'm working with a non-latin default encoding, and I've opened a file:
fobj = open('my_nonlatin.txt', 'rt')
in python 3.2. That might contain numbers and non-latin text. I can't pass that into 'genfromtxt' because it will give me this error above. I can pass it is as binary but then I'll get garbled text.
That's the way it also works on Python 2. The text is not garbled -- it's just in some binary representation that you can later on decode to unicode:
np.array(['asd']).view(np.chararray).decode('utf-8') array([u'asd'], dtype='
Granted, utf-16 and the ilk might be problematic.
Should those functions also allow unicode-providing files (perhaps with binary as default for speed)?
Nobody has yet asked for this feature as far as I know, so I guess the need for it is pretty low. Personally, I don't think going unicode makes much sense here. First, it would be a Py3-only feature. Second, there is a real need for it only when dealing with multibyte encodings, which are seldom used these days with utf-8 rightfully dominating. -- Pauli Virtanen
Hi,
On Wed, Mar 30, 2011 at 11:32 AM, Pauli Virtanen
On Wed, 30 Mar 2011 10:37:45 -0700, Matthew Brett wrote: [clip]
imagine I'm working with a non-latin default encoding, and I've opened a file:
fobj = open('my_nonlatin.txt', 'rt')
in python 3.2. That might contain numbers and non-latin text. I can't pass that into 'genfromtxt' because it will give me this error above. I can pass it is as binary but then I'll get garbled text.
That's the way it also works on Python 2. The text is not garbled -- it's just in some binary representation that you can later on decode to unicode:
np.array(['asd']).view(np.chararray).decode('utf-8') array([u'asd'], dtype='
Granted, utf-16 and the ilk might be problematic.
Should those functions also allow unicode-providing files (perhaps with binary as default for speed)?
Nobody has yet asked for this feature as far as I know, so I guess the need for it is pretty low.
Personally, I don't think going unicode makes much sense here. First, it would be a Py3-only feature. Second, there is a real need for it only when dealing with multibyte encodings, which are seldom used these days with utf-8 rightfully dominating.
It's not a feature I need, but then, I'm afraid all the languages I've been taught are latin-1. Oh, except I learnt a tiny bit of Greek. But I don't use it for work :) I suppose the annoyances would be: 1) Probably temporary surprise that genfromtxt(open('my_file.txt', 'rt')) generates this error 2) Having to go back over returned arrays decoding stuff for utf-8 3) Wrong results for other encodings Maybe the best way is a graceful warning on entry to the routine? Best, Matthew
participants (4)
-
josef.pktd@gmail.com
-
Matthew Brett
-
Pauli Virtanen
-
Ralf Gommers