On Thu, Sep 19, 2019 at 10:18 PM Richard Higginbotham
I'm sorry I haven't used the mail list before and didn't send my comments to the list. Please see my response to Chris for some example times. I'm reluctant to use anything less than about a ms for timing purposes. There's too many interactions at the us level that can skew results. I'd prefer to expand the sample size until it's more clear. Here is an example
For A not in B: trying A 10000, B 50000 naive test 0:00:10.319032 set test 0:00:00.009001 listex test 0:00:00.005000
For A subset of B: trying A 100000, B 500000 set test 0:00:00.180018 listex test 0:00:00.054006
Richard, I took your listex.py and modified it to use timeit module for benchmarking and put it here ( https://gist.github.com/risa2000/c44c1a06e89f348d82efcce5ec722774). The results I got from the modified code were: For A not in B: trying A 1000, B 5000 best naive_relcomp test time (msec): 38.125 best relcomp_set test time (msec): 0.034 best relcomp_set with set init test time (msec): 0.218 best list_relcomp test time (msec): 0.211 For A not in B: trying A 10000, B 50000 best relcomp_set test time (msec): 0.545 best relcomp_set with set init test time (msec): 3.093 best list_relcomp test time (msec): 2.101 For A subset of B: trying A 100000, B 500000 best relcomp_set test time (msec): 8.508 best relcomp_set with set init test time (msec): 55.001 best list_relcomp test time (msec): 29.242 Now, what is interesting about this is the benchmark I added: calculating the set difference while including the set setup (basically the set(list) call). This is the line with "best relcomp_set with set init". The results for those not familiar with Richard's listex.py are: 1) 'relcomp_set' is calculating set difference by using subtraction operator (a - b) on two sets built from the two unsorted lists. 2) 'relcomp_set with set init' is the same function but with the set setup included, so basically measuring set(list_a) - set(list_b) 3) 'list_relcomp' is the function using Richard's implementation of the difference on the lists (again unsorted). I did not verify the results (I hope they are correct), but the timing suggest that the set operation is always faster. When however we add the set setup to the mix, the build up time, which is O(n + m) becomes more significant than actually sorting and comparing the lists which is technically of O(nlog(n) + mlog(m)). It would also suggest that for some large lists the list implementation _can_ actually be faster if we calculate in an eventual conversion of the lists to the sets and possible even more faster if the lists were already ordered. What I cannot explain, why the setup time becomes so significant only at A 10000, B 50000 and does not make much of a difference in case A 1000, B 5000. It would suggest that there is some other factor which has a different impact depending on the set size. Richard M.