Balanced trees
Steven D'Aprano
steve+comp.lang.python at pearwood.info
Mon Mar 10 04:24:13 CET 2014
On Sun, 09 Mar 2014 18:04:46 -0600, Ian Kelly wrote:
> On Sun, Mar 9, 2014 at 4:08 PM, Dan Stromberg <drsalists at gmail.com>
> wrote:
>> On Sun, Mar 9, 2014 at 2:43 PM, Marko Rauhamaa <marko at pacujo.net>
>> wrote:
>>> Dan Stromberg <drsalists at gmail.com>:
>>>
>>>> On Sun, Mar 9, 2014 at 2:32 PM, Marko Rauhamaa <marko at pacujo.net>
>>>> wrote:
>>>>> There is no O(1) hash table.
>>>>
>>>> http://stackoverflow.com/questions/2771368/can-hash-tables-really-be-
o1
>>>
>>> Please elaborate.
>>
>> A hash table of fixed size is O(1) until you fill it so full that you
>> start getting enough collisions to make it O(n), as one bucket becomes
>> a linked list. This is because the hash function is O(1), and looking
>> up a value in a C array is O(1).
>>
>> A more flexible hash table will not have a fixed size; instead it will
>> grow itself as needed. This growth operation tends to be O(n), but
>> happens vanishingly infrequently as the table grows, so it's still
>> amortized O(1).
>
> A hash table can also only ever be O(1) in the average case.
Since this discussion has apparently devolved into people trying to out-
pedant each other, I'm going to dispute that. That will depend on the
distribution of hash collisions. With a sufficiently big table,
sufficiently small set of possible data, a sufficiently good hash
function, or some combination of the above, a hash table may be O(1) even
in the worst case.
E.g. if you hash a number n to itself, e.g. hash(42) == 42, your data
consists of single byte numbers 0 through 255, and your hash table has
256 slots, *there are no collisions* and every lookup is O(1).
There are technical meanings for Big Oh notation, which can be quite
complicated. There's Big O, Little o, Big Ω (omega), Little ω (omega),
Big Θ (theta), although I don't think there's a Little θ. They can refer
to the best case, worst case, average (mean) case, median case, "typical"
case where you're making some guess of what occurs typically, and any
other situation you care about. The algorithmic complexity can apply
consistently to each and every run of the algorithm, or they can be
amortized over many runs. If you're doing a technical analysis of an
algorithm, this pedantic detail is important.
But when people are just talking informally in a hand-wavy manner, they
usually say "O(foo)" when they actually mean "for typical data under
typical conditions, the algorithm runs with a complexity of the order of
foo". And that's perfectly okay, just like it's perfectly okay to
describe the Earth as a sphere even though a pedant will call it a
wrinkly asymmetrical oblate spheroid.
--
Steven D'Aprano
http://import-that.dreamwidth.org/
More information about the Python-list
mailing list