
Perry Greenfield <perry@stsci.edu>:
[snip]
In relation to what? Using dictionaries etc? Using the array module?
No, in relation to operations on a 10K array. Basically, if an operation on a 10K array spends half its time on set up, operations on a 10 element array may only be twice as fast. I'm not making any claims about speed in relation to any other data structure (other than Numeric)
Aaah! Sorry to be so dense :) But the speedup in numeric between different sizes isn't as important to me as the speedup compared to other solutions (such as a dict-based one) of course... If a 10 element array is only twice as fast as a 10K array that's no problem if it's still faster than an alternative solution (though I'm sure it might not be...) The same goes for 10K element graphs -- the interesting point has to be whether it's faster than various alternatives (which I'm sure it is).
[snip]
Before I go further, I need to find out if the preceeding has made you gasp in horror or if the timescale is too slow for you to accept.
Hm. If you need 10000 elements before numarray pays off, I'm starting to wonder if I can use it for anything at all. :I
I didn't make clear that this threshold may improve in the future (decrease).
Right. Good. And -- on small graphs performance probably won't be much of a problem anyway. :)
The corresponding threshold for Numeric is probably around 1000 to 2000 elements. (Likewise, operations on 10 element Numeric arrays are only about twice as fast as for 1K arrays) We may be able to eventually improve numarray performance to something in that neighborhood (if we are luckly) but I would be surprised to do much better (though if we use caching techniques, perhaps repeated cases of arrays of identical shape, strides, type, etc. may run much faster on subsequent operations). As usual, performance issues can be complicated. You have to keep in mind that Numeric and numarray provide much richer indexing and conversion handling feature than something like the array module, and that comes at some price in performance for small arrays.
Of course. I guess an alternative (for the graph situation) could be to wrap the graphs with a common interface with various implementations, so that a solution more optimised for small graphs could be used (in a factory function) if the graph is small... (Not really an issue for me at the moment, but should be easy to do, I guess.) [snip]
I'm not sure. A wide range, I should imagine. But with only 100 nodes, I'll get 10000 entries in the adjacency matrix, so perhaps it's worthwile anyway?
That's right, a 100 nodes is where performance is being competitive, and if you feel you are worried about cases larger than that, then it isn't a problem.
Seems probable. For smaller problems I wouldn't be thinking in terms of numarray anyway, I think. (Just using plain Python dicts or something similar.) [snip]
On the other hand, since numarray has much better support for index arrays, i.e., an array of indices that may be used to index another array of values, index array(s), value array pair may itself serve as a storage model for sparse arrays.
That's an interesting idea, although I don't quite see how it would help in the case of adjacency matrices. (You'd still need at least one n**2 size matrix for n nodes, wouldn't you -- i.e. the index array... Right?)
Right.
I might as well use a full adjacency matrix, then... So, the conclusion for now is that numarray may well be suited for working with relatively large (100+ nodes), relatively dense graphs. Now, the next interesting question is how much of the standard graph algorithms can be implemented with ufuncs and array operations (which I guess is the key to performance) and not straight for-loops... After all, some of them are quite sequential. -- Magnus Lie Hetland The Anygui Project http://hetland.org http://anygui.org