Richard Musil wrote:
On Sat, Sep 21, 2019 at 1:44 AM Richard Higginbotham higginbr@gmail.com wrote: ..... I wrote all this to show that without an insight it might be sometimes difficult or impossible to do it right (I was caught myself in several pitfalls on my original benchmarks I posted here) and also because it was actually a fun to learn quite a few things from this thread. I am however not convinced that step-compare lists can compete with set operations (even when done on lists) at the current state of things. Richard M. I really appreciate the time and thought you have put into it (and others here as well), and its been educational / fun for me too. One of the concerns I have about using timeit is that it puts a lot of focus on the exact statements that are being executed and all the internals underneath. Looking at "list(set(a).difference(b))"; I suspect what I am seeing is 3 C calls from python. Anything after that is a comparison between C (here) and Python (in list_relcomp). I use Python for prototyping a lot so I tend to look for those situations more.
When I first wrote the algorithms(2000 or so) I compared them to [x for x in a if x in b] and I found that the later was faster. I didn't make sense at first but it was because the 'x in b' is done in C (I had just graduated and started using Python). If the data set was large enough I could see the big O growth was greater for the C version versus the python version. Similar to what we see here in the difference between the set operations and listex. I implemented listex in C and it was much faster at all data sizes over maybe 10 elements. I don't expect the same speedup compared to sets because they are more efficient. I also can't guarantee that the trend of set operations having a higher time growth will hold true for all data sizes. I only point it out and say for some use cases set relation operations aren't optimal, but they may be good enough.