Re: [Numpy-discussion] Help using numPy to create a very large multi dimensional array

On 4/18/07, numpy-discussion-request@scipy.org <numpy-discussion-request@scipy.org> wrote:
------------------------------
Message: 5 Date: Wed, 18 Apr 2007 09:11:32 -0700 From: Christopher Barker <Chris.Barker@noaa.gov> Subject: Re: [Numpy-discussion] Help using numPy to create a very large multi dimensional array To: Discussion of Numerical Python <numpy-discussion@scipy.org> Message-ID: <46264334.8080304@noaa.gov> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Bruno Santos wrote:
Finally I was able to read the data, by using the command you sair with some small changes: matrix = numpy.array([[float(x) for x in line.split()[1:]] for line in vecfile])
it doesn't sound like you're concerned about the speed of reading the files, but you can still use fromfile() or maybe fromstring() to do this. You just need to read past the text part first, then process it.
using fromstring:
matrix = numpy.vstack([numpy.fromstring(line.split(" ", 1)[1], sep=" ") for line in vecfile])
or something like that.
-Chris
I would strongly recommend pylab.load. It handles comments, selects columns, and is legible. Examples from the docstring: t,y = load('test.dat', unpack=True) # for two column data x,y,z = load('somefile.dat', usecols=(3,5,7), unpack=True) A more advanced example from examples/load_converter.py: dates, closes = load( 'data/msft.csv', delimiter=',', converters={0:datestr2num}, skiprows=1, usecols=(0,2), unpack=True) Devs, is there any possibility of moving/copying pylab.load to numpy? I don't see anything in the source that requires the rest of matplotlib. Among convenience functions, I think that this function ranks pretty highly in convenience. Take care, Nick

Nick Fotopoulos wrote:
Devs, is there any possibility of moving/copying pylab.load to numpy? I don't see anything in the source that requires the rest of matplotlib. Among convenience functions, I think that this function ranks pretty highly in convenience.
I'm supportive of this. But, it can't be named numpy.load. How about numpy.loadtxt numpy.savetxt -Travis

A Dijous 19 Abril 2007 10:17, Travis Oliphant escrigué:
Nick Fotopoulos wrote:
Devs, is there any possibility of moving/copying pylab.load to numpy? I don't see anything in the source that requires the rest of matplotlib. Among convenience functions, I think that this function ranks pretty highly in convenience.
I'm supportive of this. But, it can't be named numpy.load.
How about
numpy.loadtxt numpy.savetxt
+1 --
0,0< Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data "-"

On 4/19/07, Travis Oliphant <oliphant.travis@ieee.org> wrote:
Nick Fotopoulos wrote:
Devs, is there any possibility of moving/copying pylab.load to numpy? I don't see anything in the source that requires the rest of matplotlib. Among convenience functions, I think that this function ranks pretty highly in convenience.
I'm supportive of this. But, it can't be named numpy.load.
I am also +1 on this, but this functionality should be implemented in C, I think. I've just tested numpy.fromfile('name.txt', sep=' ') against pylab.load('name.txt') for a 35MB text file, the number are: numpy.fromfile: 2.66 sec. pylab.load: 16.64 sec. -- Lisandro Dalcín --------------- Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC) Instituto de Desarrollo Tecnológico para la Industria Química (INTEC) Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET) PTLC - Güemes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594

Lisandro Dalcin wrote:
I am also +1 on this, but this functionality should be implemented in C, I think.
well, maybe.
I've just tested numpy.fromfile('name.txt', sep=' ') against pylab.load('name.txt') for a 35MB text file, the number are:
numpy.fromfile: 2.66 sec. pylab.load: 16.64 sec.
exactly that's expected. fromfile is designed to do the easy cases as fast as possible, pylab.load is designed to be be flexible, I'm not user you need both the speed and flexibility at the same time. By the way, I haven't looked at pylab.load() for a while, but it could perhaps be sped up by using fromfile() and or fromstring internally. There may be some opportunity to special case the easy ones too (i.e. all columns desired, etc.) -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

I think it would be a great idea to have pylab.load in numpy. It also seems to be a lot faster than scipy.io. One thing that is very nice about pylab.load is that it can read-in dates. However, it can't, as far a I know, handle other non-float data. I played around with python's csv module and pylab.load for a while resulting in a database class I posted in the cookbook section: http://www.scipy.org/Cookbook/dbase This class can read any type of data in a csv file, including dates, into a dictionary but is based on both pylab.load and the csv module. I use cPickle for storing the data once it is read-in once. I haven't tried PyTables but hear a lot of good things about it. Vincent On 4/19/07 10:58 AM, "Christopher Barker" <Chris.Barker@noaa.gov> wrote:
Lisandro Dalcin wrote:
I am also +1 on this, but this functionality should be implemented in C, I think.
well, maybe.
I've just tested numpy.fromfile('name.txt', sep=' ') against pylab.load('name.txt') for a 35MB text file, the number are:
numpy.fromfile: 2.66 sec. pylab.load: 16.64 sec.
exactly that's expected. fromfile is designed to do the easy cases as fast as possible, pylab.load is designed to be be flexible, I'm not user you need both the speed and flexibility at the same time.
By the way, I haven't looked at pylab.load() for a while, but it could perhaps be sped up by using fromfile() and or fromstring internally. There may be some opportunity to special case the easy ones too (i.e. all columns desired, etc.)
-Chris
--

Whats wrong with scipy.io.read_array? Am 19.04.2007 um 15:50 schrieb Lisandro Dalcin:
On 4/19/07, Travis Oliphant <oliphant.travis@ieee.org> wrote:
Nick Fotopoulos wrote:
Devs, is there any possibility of moving/copying pylab.load to numpy? I don't see anything in the source that requires the rest of matplotlib. Among convenience functions, I think that this function ranks pretty highly in convenience.
I'm supportive of this. But, it can't be named numpy.load.
I am also +1 on this, but this functionality should be implemented in C, I think. I've just tested numpy.fromfile('name.txt', sep=' ') against pylab.load('name.txt') for a 35MB text file, the number are:
numpy.fromfile: 2.66 sec. pylab.load: 16.64 sec.
-- Lisandro Dalcín --------------- Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC) Instituto de Desarrollo Tecnológico para la Industria Química (INTEC) Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET) PTLC - Güemes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion

It seems to be a lot slower than pylab.load for large arrays. Also, it doesn't handle dates. Vincent On 4/22/07 10:33 AM, "Markus Rosenstihl" <markusro@element.fkp.physik.tu-darmstadt.de> wrote:
Whats wrong with scipy.io.read_array?
Am 19.04.2007 um 15:50 schrieb Lisandro Dalcin:
On 4/19/07, Travis Oliphant <oliphant.travis@ieee.org> wrote:
Nick Fotopoulos wrote:
Devs, is there any possibility of moving/copying pylab.load to numpy? I don't see anything in the source that requires the rest of matplotlib. Among convenience functions, I think that this function ranks pretty highly in convenience.
I'm supportive of this. But, it can't be named numpy.load.
I am also +1 on this, but this functionality should be implemented in C, I think. I've just tested numpy.fromfile('name.txt', sep=' ') against pylab.load('name.txt') for a 35MB text file, the number are:
numpy.fromfile: 2.66 sec. pylab.load: 16.64 sec.
-- Lisandro Dalcín --------------- Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC) Instituto de Desarrollo Tecnológico para la Industria Química (INTEC) Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET) PTLC - Güemes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
-- Vincent R. Nijs Assistant Professor of Marketing Kellogg School of Management, Northwestern University 2001 Sheridan Road, Evanston, IL 60208-2001 Phone: +1-847-491-4574 Fax: +1-847-491-2498 E-mail: v-nijs@kellogg.northwestern.edu Skype: vincentnijs
participants (8)
-
Christopher Barker
-
Francesc Altet
-
Gael Varoquaux
-
Lisandro Dalcin
-
Markus Rosenstihl
-
Nick Fotopoulos
-
Travis Oliphant
-
Vincent Nijs