[Python-ideas] Re: Set operations with Lists

20 Sep 2019

      ...
On Sep 19, 2019, at 12:07, Richard Higginbotham  wrote:
Its faster than the built in set type.
Not according to timeit. If you want to dispute that, you need a proper benchmark, not just repeating the claims from your inaccurate one.

I’m not sure what your relationtype is supposed to do (especially given than 1 and 2 produce the same values), and all versions seem like unusual use cases rather than anything approaching realistic, but I left that all alone and just replaced your benchmarks with timeit, repeated them enough times to actually be meaningful, and changed the test to the operation you claim it’s about rather than a different one. The output pretty clearly shows that a pre-existing set beats a pre-sorted list, and building a set on the fly beats sorting a list on the fly, and even comparing blatantly unfairly with building a set on the fly vs. using a pre-sorted list, the set still wins except in the smallest test.

I didn’t include tests against more realistic datasets where your algorithm does significantly worse; this shows that even your chosen input is slower with your algorithm if you measure it properly.

And, while we’re at it, I added a version of your algorithm that works with arbitrary iterables and also takes a key function (which you always want when you’re doing anything related to sorting in Python), and it’s about 10-20% faster than your list version, which shows that you aren’t getting any performance benefit from restricting yourself to lists only.

See https://pastebin.com/BC2zgtN6 for the code.
...
Uniqueness of values isn't a requirement.
Of course. But non-unique values make the set algorithm faster, and they make your algorithm slower. And nearly-unique values don’t affect the set time, but they make your algorithm slower. So you can’t just ignore it.

And, again, you have to decide what happens with non-unique values, how many times they appear in the output.
...
The values need to be comparable so that they can be sorted. This allows a linear time complexity.
Except that sorting is log-linear, not linear, so it doesn’t.

Meanwhile, set actually _does_ allow linear time complexity.

Again, none of this means your code is useless. I think a step-compare intersection is useful enough to belong in itertools, at least as a recipe. But it’s not faster than set, or simpler, or anything else you’re arguing.

[Python-ideas] Re: Set operations with Lists

Andrew Barnert