Re: [Numpy-discussion] fromstring, tostring slow?

13 Feb 2007

      Found a typo-or-two in my description.  #2 and #3 are nnx1 in shape

-----Original Message-----
From: numpy-discussion-bounces@scipy.org
[mailto:numpy-discussion-bounces@scipy.org] On Behalf Of Mark Janikas
Sent: Tuesday, February 13, 2007 4:31 PM
To: Discussion of Numerical Python
Subject: Re: [Numpy-discussion] fromstring, tostring slow?

This is all very good info.  Especially, the byteswap.  Ill be testing
it momentarily.  As far as a detailed explanation of the problem....

In essence, I am applying sparse matrix multiplication.  The matrix of
which I am dealing with in the matter described is nxn.  Generally, this
matrix is 1-20% sparse.  I use it in spatial data analysis, where the
matrix W represents the spatial association between n observations.  The
operations I perform on it are generally related to the spatial lag of a
variable... or Wy, where y is a nxk matrix (usually k=1).  As k is
generally small, the y vector and the result vector are represented by
numpy arrays.  I can have nxkx2 pieces of info in mem (usually).  What I
cant have is n**2.  So, I store each row of W in a file as a record
consisting of 3 parts:

1) row, nn (# of neighbors)
2) nhs (nx1) vector of integers representing the columns in row[i] != 0
3) weights (nx1) vector of floats corresponding to the index in the
previous row

The first two parts of the record are known as a GAL or geographic
algorithm library.  Since a lot of my W matrices have distance metrics
associated with them I added the third.  I think this might be termed by
someone else as an enhanced GAL.  At any rate, this allows me to perform
this operation on large datasets w/o running out of mem.

-----Original Message-----
From: numpy-discussion-bounces@scipy.org
[mailto:numpy-discussion-bounces@scipy.org] On Behalf Of Christopher
Barker
Sent: Tuesday, February 13, 2007 4:07 PM
To: Discussion of Numerical Python
Subject: Re: [Numpy-discussion] fromstring, tostring slow?

Mark Janikas wrote:
...
I don't think I can do that because I have heterogeneous rows of
data.... I.e. the columns in each row are different in length.
like I said, show us your whole problem...

But you don't have to write.read all the data at once with from/tofile()

anyway. Each of your "rows" has to be in a separate array anyway, as 
numpy arrays don't support "ragged" arrays, but each row can be written 
with tofile()
...
Furthermore, when reading it back in, I want to read only bytes of the
info at a time so I can save memory.  In this case, I only want to
have
one record in mem at once.
you can make multiple calls to fromfile(), thou you'll have to know how 
long each record is.
...
Another issue has arisen from taking this routine cross-platform....
namely, if I write the file on Windows I cant read it on Solaris.  I
assume the big-little endian is at hand here.
yup.
...
I know using the struct
module that I can pack using either one.
so can numpy. see the "byteswap" method, and you can specify a 
particular endianess with a datatype when you read with fromfile():

a = N.fromfile(DataFile, dtype=N.dtype("<d"), count=20)

reads 20 little-endian doubles from DataFile, regardless of the native 
endianess of the machine you're on.

-Chris

-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker@noaa.gov
_______________________________________________
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion