Re: [Numpy-discussion] Graphs in numarray?

April 18, 2002

      Perry Greenfield <perry@stsci.edu>:
...
[snip]
...
...
In relation to what? Using dictionaries etc? Using the array module?
No, in relation to operations on a 10K array. Basically, if an operation
on a 10K array spends half its time on set up, operations on a
10 element array may only be twice as fast. I'm not making any claims
about speed in relation to any other data structure (other than Numeric)
Aaah! Sorry to be so dense :)

But the speedup in numeric between different sizes isn't as important
to me as the speedup compared to other solutions (such as a dict-based
one) of course... If a 10 element array is only twice as fast as a 10K
array that's no problem if it's still faster than an alternative
solution (though I'm sure it might not be...)

The same goes for 10K element graphs -- the interesting point has to
be whether it's faster than various alternatives (which I'm sure it
is).
...
...
[snip]
...
Before I go further, I need to find out if the preceeding has made
you gasp in horror or if the timescale is too slow for you to
accept.
Hm. If you need 10000 elements before numarray pays off, I'm starting
to wonder if I can use it for anything at all. :I
I didn't make clear that this threshold may improve in the future
(decrease).
Right. Good.

And -- on small graphs performance probably won't be much of a problem
anyway. :)
...
The corresponding threshold for Numeric is probably 
around 1000 to 2000 elements. (Likewise, operations on 10 element
Numeric arrays are only about twice as fast as for 1K arrays)
We may be able to eventually improve numarray performance to something 
in that neighborhood (if we are luckly) but I would be surprised to
do much better (though if we use caching techniques, perhaps repeated
cases of arrays of identical shape, strides, type, etc. may run
much faster on subsequent operations). As usual, performance issues
can be complicated. You have to keep in mind that Numeric and numarray
provide much richer indexing and conversion handling feature than 
something like the array module, and that comes at some price in
performance for small arrays.
Of course.

I guess an alternative (for the graph situation) could be to wrap the
graphs with a common interface with various implementations, so that a
solution more optimised for small graphs could be used (in a factory
function) if the graph is small... (Not really an issue for me at the
moment, but should be easy to do, I guess.)

[snip]
...
...
I'm not sure. A wide range, I should imagine. But with only 100 nodes,
I'll get 10000 entries in the adjacency matrix, so perhaps it's
worthwile anyway?
That's right, a 100 nodes is where performance is being competitive,
and if you feel you are worried about cases larger than that, then
it isn't a problem.
Seems probable. For smaller problems I wouldn't be thinking in terms
of numarray anyway, I think. (Just using plain Python dicts or
something similar.)

[snip]
...
...
...
On the other hand, since numarray has much better support for index
arrays, i.e., an array of indices that may be used to index another
array of values,  index array(s), value array pair may itself serve
as a storage model for sparse arrays.
That's an interesting idea, although I don't quite see how it would
help in the case of adjacency matrices. (You'd still need at least one
n**2 size matrix for n nodes, wouldn't you -- i.e. the index array...
Right?)
Right.
I might as well use a full adjacency matrix, then...

So, the conclusion for now is that numarray may well be suited for
working with relatively large (100+ nodes), relatively dense graphs.

Now, the next interesting question is how much of the standard graph
algorithms can be implemented with ufuncs and array operations (which
I guess is the key to performance) and not straight for-loops... After
all, some of them are quite sequential.

--
Magnus Lie Hetland                                  The Anygui Project
http://hetland.org                                  http://anygui.org