
"Arnaud" == Arnaud Delobelle <arnodel@googlemail.com> writes: Arnaud> I've written a patch [1] that does that. Following the suggestion Arnaud> of Raymond Hettinger, I've implemented set.intersection by sorting Arnaud> all its sets/frozensets/dicts in increasing order of size first, Arnaud> then iterating over the smallest.
Hi Arnaud I don't know if you'll do any benchmarking on this, but I'd suggest: - First find the set with the smallest size (this is O(n) work). - If that size is sufficiently small and the number of sets is sufficiently large (numbers to be determined by testing), don't sort the sets by size - just go for it. - Else do the sorting. The point being that if the smallest set is already quite small, the size of the intersection is already tightly bounded and you're possibly going to do an expensive sort that's really not needed. The O(n) work to find the smallest is tiny compared to just blindly doing O(n lg n) immediately. Most of the juice you get from moving from small to big sets comes from starting with the smallest. A few benchmarks should give an idea of when to sort. BTW, having a quick look at your diff (not the patched source) it looks like you're testing each of the elements of the smallest set against all other hashtables. I haven't thought about it much, but that seems to partly defeat the purpose of sorting. Speed will depend on the inputs, but I'd have guessed that in general you should be testing each member of the smallest for presence in the next set, short-circuiting if empty, then testing each of the survivors against the next set, etc. That's more of a "vertical" approach than the horizontal one you take across all the hashtables (possibly also with a speed benefit due to locality of reference). Also, why not test first against the iterables that are not hashtables? Wouldn't that be faster in the (common?) case of many sets being passed for intersection? Sorry if this is all clueless - it's just my thinking as I looked at your diff. Regards, Terry