Found a typo-or-two in my description. #2 and #3 are nnx1 in shape -----Original Message----- From: numpy-discussion-bounces@scipy.org [mailto:numpy-discussion-bounces@scipy.org] On Behalf Of Mark Janikas Sent: Tuesday, February 13, 2007 4:31 PM To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] fromstring, tostring slow? This is all very good info. Especially, the byteswap. Ill be testing it momentarily. As far as a detailed explanation of the problem.... In essence, I am applying sparse matrix multiplication. The matrix of which I am dealing with in the matter described is nxn. Generally, this matrix is 1-20% sparse. I use it in spatial data analysis, where the matrix W represents the spatial association between n observations. The operations I perform on it are generally related to the spatial lag of a variable... or Wy, where y is a nxk matrix (usually k=1). As k is generally small, the y vector and the result vector are represented by numpy arrays. I can have nxkx2 pieces of info in mem (usually). What I cant have is n**2. So, I store each row of W in a file as a record consisting of 3 parts: 1) row, nn (# of neighbors) 2) nhs (nx1) vector of integers representing the columns in row[i] != 0 3) weights (nx1) vector of floats corresponding to the index in the previous row The first two parts of the record are known as a GAL or geographic algorithm library. Since a lot of my W matrices have distance metrics associated with them I added the third. I think this might be termed by someone else as an enhanced GAL. At any rate, this allows me to perform this operation on large datasets w/o running out of mem. -----Original Message----- From: numpy-discussion-bounces@scipy.org [mailto:numpy-discussion-bounces@scipy.org] On Behalf Of Christopher Barker Sent: Tuesday, February 13, 2007 4:07 PM To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] fromstring, tostring slow? Mark Janikas wrote:
I don't think I can do that because I have heterogeneous rows of data.... I.e. the columns in each row are different in length.
like I said, show us your whole problem... But you don't have to write.read all the data at once with from/tofile() anyway. Each of your "rows" has to be in a separate array anyway, as numpy arrays don't support "ragged" arrays, but each row can be written with tofile()
Furthermore, when reading it back in, I want to read only bytes of the info at a time so I can save memory. In this case, I only want to have one record in mem at once.
you can make multiple calls to fromfile(), thou you'll have to know how long each record is.
Another issue has arisen from taking this routine cross-platform.... namely, if I write the file on Windows I cant read it on Solaris. I assume the big-little endian is at hand here.
yup.
I know using the struct module that I can pack using either one.
so can numpy. see the "byteswap" method, and you can specify a particular endianess with a datatype when you read with fromfile(): a = N.fromfile(DataFile, dtype=N.dtype("<d"), count=20) reads 20 little-endian doubles from DataFile, regardless of the native endianess of the machine you're on. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion