steve+comp.lang.python at pearwood.info
Wed Mar 19 14:42:04 CET 2014
On Wed, 19 Mar 2014 10:49:33 +0200, Marko Rauhamaa wrote:
> Steven D'Aprano <steve+comp.lang.python at pearwood.info>:
>> If you are in a position to randomize the data before storing it in the
>> tree, an unbalanced binary tree is a solid contender.
> Applications that can assume randomly distributed data are exceedingly
Please re-read what I wrote. I didn't say "if your data comes to you
fully randomized". I said "if you are in a position to randomize the data
before storing it". In other words, if you actively randomize the data
> and even then, you can't easily discount the possibility of
> worst-case ordering.
Of course you can. If you have a million items, then the odds that those
million items happen to be sorted (the worst-case order) are 1 in a
million factorial. That's a rather small number, small enough that we can
discount it from ever happening, not in a million lifetimes of the
Of course, you would be right to point out that one would also like to
avoid *nearly* worst-case ordering. Nevertheless, there are Vastly more
ways that the data will be sufficiently randomized as to avoid degenerate
performance than the worst-case poor performance.
> In fact, Dan himself said unbalanced trees performed so badly he
> couldn't test them satisfactorily.
You're misrepresenting what Dan said. What he actually said was that
unbalanced trees perform second only to dicts with random data, only with
sorted data do they perform badly. His exact words were:
"For a series of random keys, it would do quite well (probably second only
to dict), but for a series of sequential keys it would take longer than
anyone would reasonably want to wait"
More information about the Python-list