[Numpy-discussion] Graphs in numarray?

Thu Apr 18 08:22:06 EDT 2002

> Behalf Of Magnus Lie Hetland
> Perry Greenfield <perry at stsci.edu>:
> [snip]
> > First of all, it may make sense, but I should say a few words about
> > what scale sizes make sense.
> [snip]
> > So if you are working with much smaller arrays than 10K, you won't
> > see total execution time decrease much
> 
> In relation to what? Using dictionaries etc? Using the array module?

No, in relation to operations on a 10K array. Basically, if an operation
on a 10K array spends half its time on set up, operations on a
10 element array may only be twice as fast. I'm not making any claims
about speed in relation to any other data structure (other than Numeric)

> [snip]
> > Before I go further, I need to find out if the preceeding has made
> > you gasp in horror or if the timescale is too slow for you to
> > accept.
> 
> Hm. If you need 10000 elements before numarray pays off, I'm starting
> to wonder if I can use it for anything at all. :I
> 
I didn't make clear that this threshold may improve in the future
(decrease). The corresponding threshold for Numeric is probably 
around 1000 to 2000 elements. (Likewise, operations on 10 element
Numeric arrays are only about twice as fast as for 1K arrays)
We may be able to eventually improve numarray performance to something 
in that neighborhood (if we are luckly) but I would be surprised to
do much better (though if we use caching techniques, perhaps repeated
cases of arrays of identical shape, strides, type, etc. may run
much faster on subsequent operations). As usual, performance issues
can be complicated. You have to keep in mind that Numeric and numarray
provide much richer indexing and conversion handling feature than 
something like the array module, and that comes at some price in
performance for small arrays.

> > (This particular issue also makes me wonder if numarray would
> > ever be a suitable substitute for the existing array module).
> 
> Indeed.
> 
> > What size graphs are you most concerned about as far as speed goes?
> 
> I'm not sure. A wide range, I should imagine. But with only 100 nodes,
> I'll get 10000 entries in the adjacency matrix, so perhaps it's
> worthwile anyway?
> 
That's right, a 100 nodes is where performance is being competitive,
and if you feel you are worried about cases larger than that, then
it isn't a problem. But if you are operating mostly on small graphs,
then it may not be appropriate. The corresponding threshold for numeric
would be on the order of 30 nodes.

> > On the other hand, since numarray has much better support for index
> > arrays, i.e., an array of indices that may be used to index another
> > array of values,  index array(s), value array pair may itself serve
> > as a storage model for sparse arrays.
> 
> That's an interesting idea, although I don't quite see how it would
> help in the case of adjacency matrices. (You'd still need at least one
> n**2 size matrix for n nodes, wouldn't you -- i.e. the index array...
> Right?)
> 
Right.

>