I think there is a problem here using timeit for this type of problem, look at the large amount of time used by the set init time. Noticing this I changed my sorts to in place and got better results. I assume there is some copying of locals or other overhead but I can't explain the results. https://github.com/ponderpanda/listex/blob/master/listex_timit.py For A not in B: trying A 1000, B 5000 best relcomp_set test time (msec): 0.039 best relcomp_set with set init test time (msec): 0.247 best list_relcomp test time (msec): 0.280 For A not in B: trying A 10000, B 50000 best relcomp_set test time (msec): 0.618 best relcomp_set with set init test time (msec): 4.156 best list_relcomp test time (msec): 2.890 For A subset of B: trying A 100000, B 500000 best relcomp_set test time (msec): 12.664 best relcomp_set with set init test time (msec): 73.243 best list_relcomp test time (msec): 146.397 For A not in B: trying A 100, B 500 best naive_relcomp test time (msec): 0.428 best relcomp_set test time (msec): 0.004 best relcomp_set with set init test time (msec): 0.020 best list_relcomp test time (msec): 0.027 I still don't trust it but it seems more reasonable. It also lines up with my previous results a little better.