Performance: sets vs dicts.

Wed Sep 1 22:57:16 EDT 2010

On 09/01/2010 04:51 PM, Raymond Hettinger wrote:
> On Aug 30, 6:03 am, a... at pythoncraft.com (Aahz) wrote:
>> That reminds me: one co-worker (who really should have known better ;-)
>> had the impression that sets were O(N) rather than O(1).  Although
>> writing that off as a brain-fart seems appropriate, it's also the case
>> that the docs don't really make that clear, it's implied from requiring
>> elements to be hashable.  Do you agree that there should be a comment?
>
> There probably ought to be a HOWTO or FAQ entry on algorithmic
> complexity
> that covers classes and functions where the algorithms are
> interesting.
> That will concentrate the knowledge in one place where performance is
> a
> main theme and where the various alternatives can be compared and
> contrasted.

> I think most users of sets rarely read the docs for sets.  The few lines
> in the tutorial are enough so that most folks "just get it" and don't read
> more detail unless they attempting something exotic.

I think that attitude is very dangerous.  There is
a long history in this world of one group of people
presuming what another group of people does or does
not do or think.  This seems to be a characteristic
of human beings and is often used to promote one's
own ideology.  And even if you have hard evidence
for what you say, why should 60% of people who don't
read docs justify providing poor quality docs to
the 40% that do?

So while you may "think" most people rarely read
the docs for basic language features and objects
(I presume you don't mean to restrict your statement
to only sets), I and most people I know *do* read
them.  And when read them I expect them, as any good
reference documentation does, to completely and
accurately describe the behavior of the item I am
reading about.  If big-O performance is deemed an
intrinsic behavior of an (operation of) an object,
it should be described in the documentation for
that object.

Your use of the word "exotic" is also suspect.
I learned long ago to always click the "advanced
options" box on dialogs because most developers/-
designers really don't have a clue about what
users need access to.

> Our docs have gotten
> somewhat voluminous,

No they haven't (relative to what they attempt to
describe).  The biggest problem with the docs is
that they are too terse.  They often appear to have
been written by people playing a game of "who can
describe X in the minimum number of words that can
still be defended as correct."  While that may be
fun, good docs are produced by considering how to
describe something to the reader, completely and
accurately, as effectively as possible.  The test
is not how few words were used, but how quickly
the reader can understand the object or find the
information being sought about the object.

> so it's unlikely that adding that particular
> needle to the haystack would have cured your colleague's "brain-fart"
> unless he had been focused on a single document talking about the
> performance
> characteristics of various data structures.

I don't know the colleague any more that you so I
feel comfortable saying that having it very likely
*would* have cured that brain-fart.  That is, he
or she very likely would have needed to check some
behavior of sets at some point and would have either
noted the big-O characteristics in passing, or would
have noted that such information was available, and
would have returned to the documentation when the
need for that information arose.  The reference
description of sets is the *one* canonical place to
look for information about sets.

There are people who don't read documentation, but
one has to be very careful not use the existence
of such people as an excuse to justify sub-standard
documentation.

So I think relegating algorithmic complexity information
to some remote document far from the description of the
object it pertains to, is exactly the wrong approach.
This is not to say that a performance HOWTO or FAQ
in addition to the reference manual would not be good.