[Numpy-discussion] Graphs in numarray?
Magnus Lie Hetland
magnus at hetland.org
Thu Apr 18 08:48:17 EDT 2002
Perry Greenfield <perry at stsci.edu>:
> > In relation to what? Using dictionaries etc? Using the array module?
> No, in relation to operations on a 10K array. Basically, if an operation
> on a 10K array spends half its time on set up, operations on a
> 10 element array may only be twice as fast. I'm not making any claims
> about speed in relation to any other data structure (other than Numeric)
Aaah! Sorry to be so dense :)
But the speedup in numeric between different sizes isn't as important
to me as the speedup compared to other solutions (such as a dict-based
one) of course... If a 10 element array is only twice as fast as a 10K
array that's no problem if it's still faster than an alternative
solution (though I'm sure it might not be...)
The same goes for 10K element graphs -- the interesting point has to
be whether it's faster than various alternatives (which I'm sure it
> > [snip]
> > > Before I go further, I need to find out if the preceeding has made
> > > you gasp in horror or if the timescale is too slow for you to
> > > accept.
> > Hm. If you need 10000 elements before numarray pays off, I'm starting
> > to wonder if I can use it for anything at all. :I
> I didn't make clear that this threshold may improve in the future
And -- on small graphs performance probably won't be much of a problem
> The corresponding threshold for Numeric is probably
> around 1000 to 2000 elements. (Likewise, operations on 10 element
> Numeric arrays are only about twice as fast as for 1K arrays)
> We may be able to eventually improve numarray performance to something
> in that neighborhood (if we are luckly) but I would be surprised to
> do much better (though if we use caching techniques, perhaps repeated
> cases of arrays of identical shape, strides, type, etc. may run
> much faster on subsequent operations). As usual, performance issues
> can be complicated. You have to keep in mind that Numeric and numarray
> provide much richer indexing and conversion handling feature than
> something like the array module, and that comes at some price in
> performance for small arrays.
I guess an alternative (for the graph situation) could be to wrap the
graphs with a common interface with various implementations, so that a
solution more optimised for small graphs could be used (in a factory
function) if the graph is small... (Not really an issue for me at the
moment, but should be easy to do, I guess.)
> > I'm not sure. A wide range, I should imagine. But with only 100 nodes,
> > I'll get 10000 entries in the adjacency matrix, so perhaps it's
> > worthwile anyway?
> That's right, a 100 nodes is where performance is being competitive,
> and if you feel you are worried about cases larger than that, then
> it isn't a problem.
Seems probable. For smaller problems I wouldn't be thinking in terms
of numarray anyway, I think. (Just using plain Python dicts or
> > > On the other hand, since numarray has much better support for index
> > > arrays, i.e., an array of indices that may be used to index another
> > > array of values, index array(s), value array pair may itself serve
> > > as a storage model for sparse arrays.
> > That's an interesting idea, although I don't quite see how it would
> > help in the case of adjacency matrices. (You'd still need at least one
> > n**2 size matrix for n nodes, wouldn't you -- i.e. the index array...
> > Right?)
I might as well use a full adjacency matrix, then...
So, the conclusion for now is that numarray may well be suited for
working with relatively large (100+ nodes), relatively dense graphs.
Now, the next interesting question is how much of the standard graph
algorithms can be implemented with ufuncs and array operations (which
I guess is the key to performance) and not straight for-loops... After
all, some of them are quite sequential.
Magnus Lie Hetland The Anygui Project
More information about the NumPy-Discussion