On Sep 23, 2019, at 15:32, Richard Higginbotham
Considering your use case however, I wonder, if you would not be better going with the iterator approach (as Andrew has been hinting already for some time in his posts). Richard M. They will most likely have good performance on data sets that are close, but very poor on those that aren't balanced.
The code you posted doesn’t do any galloping, it just advances the index one at a time. The fact that you could do approximate binary searches to optimize it doesn’t help if you never do those searches. So it’ll be exactly as poor on unbalanced inputs as my Iterator code. After all, they’re exactly the same algorithm (except for trivial differences in how they handle one input or the other ending early); the only difference is that one implementation calls next and remembers the value, the other increases an index and indexes a list. Of course if you always have lists already in memory, the added overhead of using an Iterator over the list is cost for no benefit. But that’s a fixed multiplier, it’s not different for balanced vs. unbalanced inputs. And if course if the Iterator ever allows you to avoid building an otherwise-unnecessary list, it’s a win, but that win is also a flat cost that depends only on the allocation time for N elements, which also doesn’t vary based on balanced vs. unbalanced inputs. Anyway, if your use case really is a gigantic directory from a filesystem that guarantees sorted order and it supports iterdir, iterators should help quite a bit.