using loadtxt() for given number of rows?
Hi, I work with text files which contain several arrays separated by a few lines of other information, for example: POINTS 4 float -5.000000e-01 -5.000000e-01 0.000000e+00 5.000000e-01 -5.000000e-01 0.000000e+00 5.000000e-01 5.000000e-01 0.000000e+00 -5.000000e-01 5.000000e-01 0.000000e+00 CELLS 2 8 3 0 1 2 3 2 3 0 (yes, that's the legacy VTK format, but take it just as an example) I have used custom Python code with loops to read similar files, so the speed was not too good. Now I wonder if it would be possible to use the numpy.loadtxt() function for the "array-like" parts. It supports passing an open file object in, which is good, but it wants to read the entire file, which does not work in this case. It seems to me, that an additional parameter to loadtxt(), say "nrows" or "numrows", would do the job, so that the function does not try reading the entire file. Another possibility would be to raise an exception as it is now, but also to return the data succesfully read so far. What do you think? Is this worth a ticket? r.
On 1/31/11 4:39 AM, Robert Cimrman wrote:
I work with text files which contain several arrays separated by a few lines of other information, for example:
POINTS 4 float -5.000000e-01 -5.000000e-01 0.000000e+00 5.000000e-01 -5.000000e-01 0.000000e+00 5.000000e-01 5.000000e-01 0.000000e+00 -5.000000e-01 5.000000e-01 0.000000e+00
CELLS 2 8 3 0 1 2 3 2 3 0
I have used custom Python code with loops to read similar files, so the speed was not too good. Now I wonder if it would be possible to use the numpy.loadtxt() function for the "array-like" parts. It supports passing an open file object in, which is good, but it wants to read the entire file, which does not work in this case.
It seems to me, that an additional parameter to loadtxt(), say "nrows" or "numrows", would do the job,
I agree that that would be a useful feature. However, I'm not sure it would help performance much -- I think loadtxt is written in python as well. One option in the meantime. If you know how many rows, you presumable know how many items on each row. IN that case, you can use: np.fromfile(open_file, sep=' ', count=num_items_to_read) It'll only work for multi-line text if the separator is whitespace, which it was in your example. But if it does, it should be pretty fast. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
On Mon, 31 Jan 2011, Christopher Barker wrote:
On 1/31/11 4:39 AM, Robert Cimrman wrote:
I work with text files which contain several arrays separated by a few lines of other information, for example:
POINTS 4 float -5.000000e-01 -5.000000e-01 0.000000e+00 5.000000e-01 -5.000000e-01 0.000000e+00 5.000000e-01 5.000000e-01 0.000000e+00 -5.000000e-01 5.000000e-01 0.000000e+00
CELLS 2 8 3 0 1 2 3 2 3 0
I have used custom Python code with loops to read similar files, so the speed was not too good. Now I wonder if it would be possible to use the numpy.loadtxt() function for the "array-like" parts. It supports passing an open file object in, which is good, but it wants to read the entire file, which does not work in this case.
It seems to me, that an additional parameter to loadtxt(), say "nrows" or "numrows", would do the job,
I agree that that would be a useful feature. However, I'm not sure it would help performance much -- I think loadtxt is written in python as well.
I see. Anyway, it would allow me to reduce my code size, which counts as well to be a good thing. So there is now a new enhancement ticket [1].
One option in the meantime. If you know how many rows, you presumable know how many items on each row. IN that case, you can use:
np.fromfile(open_file, sep=' ', count=num_items_to_read)
It'll only work for multi-line text if the separator is whitespace, which it was in your example. But if it does, it should be pretty fast.
Good idea, the prerequisites are not met always, but often enough. Thanks! r. [1] http://projects.scipy.org/numpy/ticket/1731
Hi Robert On Mon, Jan 31, 2011 at 2:39 PM, Robert Cimrman <cimrman3@ntc.zcu.cz> wrote:
It seems to me, that an additional parameter to loadtxt(), say "nrows" or "numrows", would do the job, so that the function does not try reading the entire file. Another possibility would be to raise an exception as it is now, but also to return the data succesfully read so far.
You can always read chunks of the file into StringIO objects, and pass those into loadtxt. I think your request makes sense though, given that a person can already skip lines at the top of the file. Regards Stéfan
On 2/14/11 8:28 AM, Stéfan van der Walt wrote:
You can always read chunks of the file into StringIO objects, and pass those into loadtxt.
true, but there isn't any single method for loading n lines of a file into a StringIO object, either. I"ve always thought that file.readlines() should take a number of rows as an optional parameter. not a big deal, now that we have list comprehensions, but still it would be nice, and it makes sense to put it into loadtxt() for sure. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
On Mon, 14 Feb 2011, Stéfan van der Walt wrote:
Hi Robert
On Mon, Jan 31, 2011 at 2:39 PM, Robert Cimrman <cimrman3@ntc.zcu.cz> wrote:
It seems to me, that an additional parameter to loadtxt(), say "nrows" or "numrows", would do the job, so that the function does not try reading the entire file. Another possibility would be to raise an exception as it is now, but also to return the data succesfully read so far.
You can always read chunks of the file into StringIO objects, and pass those into loadtxt. I think your request makes sense though, given that a person can already skip lines at the top of the file.
Thanks for the tip, Stéfan! I have solved my problem by using np.fromfile(), as suggested by Chris Barker, because I know the number of items to read in advance. I also think that functionality suitable for loadtxt(), so I created the NumPy ticket 1731. r.
participants (3)
-
Christopher Barker -
Robert Cimrman -
Stéfan van der Walt