On Fri, Oct 15, 2021 at 12:16 AM Steven D'Aprano email@example.com wrote:
On Thu, Oct 14, 2021 at 11:15:52PM +1100, Chris Angelico wrote:
On Thu, Oct 14, 2021 at 11:03 PM Jeremiah Vivian firstname.lastname@example.org wrote:
Results are in (tested `next(iter(d))`/`next(iter(d.values())`/`next(iter(d.items())` and their `next(reverse())` counterparts): `*` / `/` implemented is 2x faster than `next(iter(d))`/`next(reversed(d))` `+` / `-` implemented is approximately 3x faster than `next(iter(d.values()))`/`next(reversed(d.values()))` `<<` / `>>` implemented is at least 4x faster than `next(iter(d.items()))`/`next(reversed(d.items()))`
So, negligible benefits. Thanks for checking.
Be fair Chris :-)
A 2x or 4x speed-up (even of a micro-benchmark) is not negligible. If someone managed a mere 20% or 30% speedup to next(), we would probably be more than happy to take it.
Okay, lemme rephrase. Relatively insignificant benefits, considering that (as I said in the preceding post, which was based on approximations rather than measurements) this involves hand-rolled C code instead of composing a concept out of pre-existing Python functions.
Any time you rewrite Python code in C, you can expect a measurable improvement. Having this be merely 2-4 times is rather underwhelming for a microbenchmark, although of course that same benefit for your whole project would be quite notable. But let's face it: if next(iter(d)) is the bottleneck in your code, something's wrong with your methodology or algorithm.
A percentage speedup to an existing function? Definitely, that's basically free performance gains for everything that uses it. An improvement to one very specific operation on one single data type? It has to be either an incredibly common one, or a spectacular improvement, to be more than "negligible".
For a fair comparison, I'd like to see a function that uses next(iter(d)) get cythonized. Or run in PyPy. Anything that can give drastic performance improvements for the original code. Then see whether the handrolled one is still better, and if so, how much. I suspect it will still benchmark measurably higher (since it doesn't involve two global lookups), but the difference would narrow even further. Of course, it's hard to get a fair comparison between handrolled C and PyPy, so this is probably academic, but perhaps it'll give some idea of why I consider anything less than an order of magnitude to be negligible here.
A better way to put this is that while the speed benefit to one uncommon task is non-negligible, the cost to readability and comprehensibility is horrendous. This is premature optimization: there's no evidence that getting the first element of a dict is a common operation, let alone a bottleneck that needs optimising.
Right. That's the main thing.
Ultimately, for every millisecond in program runtime saved by using obscure operators for uncommon operations on dicts, we would probably cost a dozen programmers five or ten minutes in confusion while they try to decipher what on earth `mydict << 1` means.
There are programming languages designed to be terse and even deliberately obfuscated, especially code-golfing languages. I'm glad Python is not one of them :-)
Indeed :) And I can't even picture this as being particularly useful for golfing!