No trees in the stdlib?
backup95 at netcabo.pt
Sat Jun 27 07:03:11 CEST 2009
> João Valverde wrote:
>> What's lacking is an associative array that preserves ordering,
>> doesn't require a hash function and has fast insertions and deletions
>> in O(log(n)).
> Careful here -- you can't get away from the need for
> hashability just by using a tree. Even if you don't
> need to actually hash the values, it's still important
> that the criterion for ordering between objects doesn't
> change while they're in the tree, otherwise they'll
> be in the wrong place and won't be found by subsequent
I'm aware :) Technically it's necessary to define a total ordering on
the set of keys.
> > I'm genuinely surprised to know
>> there are no data structures that efficiently support such a common
>> need in Python.
> Is it really all that common? If it truly were common,
> there probably *would* be something for it in the
> stdlib by now.
Obviously my experience differs, but those were my expectations.
> What sort of things are you doing that you want such
> a structure for? Maybe we can suggest a way of using
> the existing data structures to achieve the same
To answer the question of what I need the BSTs for, without getting into
too many boring details it is to merge and sort IP blocklists, that is,
large datasets of ranges in the form of (IP address, IP address,
string). Originally I was also serializing them in a binary format (but
no more after a redesign). I kept the "merge and sort" part as a helper
script, but that is considerably simpler to implement.
Please note that I'm happy with my code, it works well. I intended to
implement it in C all along, even before trying Python. The python code
was a side thing for testing/curiosity/fun. It prompted my original
question but that was really about Python and the standard library
itself, and I don't wish to waste anyone's time.
As an anecdotal data point (honestly not trying to raise the "Python is
slow" strawman), I implemented the same algorithm in C and Python, using
pyavl. Round numbers were 4 mins vs 4 seconds, against Python (plus
pyavl). Even considering I'm a worse Python programmer than C
programmer, it's a lot. I know many will probably think I tried to do "C
in Python" but that's not the case, at least I don' t think so. Anyway
like I said, not really relevant to this discussion.
More information about the Python-list