No trees in the stdlib?

João Valverde backup95 at netcabo.pt
Sat Jun 27 01:03:11 EDT 2009


greg wrote:
> João Valverde wrote:
>
>> What's lacking is an associative array that preserves ordering, 
>> doesn't require a hash function and has fast insertions and deletions 
>> in O(log(n)).
>
> Careful here -- you can't get away from the need for
> hashability just by using a tree. Even if you don't
> need to actually hash the values, it's still important
> that the criterion for ordering between objects doesn't
> change while they're in the tree, otherwise they'll
> be in the wrong place and won't be found by subsequent
> lookups.

I'm aware :) Technically it's necessary to define a total ordering on 
the set of keys.
>
> > I'm genuinely surprised to know
>> there are no data structures that efficiently support such a common 
>> need in Python.
>
> Is it really all that common? If it truly were common,
> there probably *would* be something for it in the
> stdlib by now.
Obviously my experience differs, but those were my expectations.
>
> What sort of things are you doing that you want such
> a structure for? Maybe we can suggest a way of using
> the existing data structures to achieve the same
> goal.
>
To answer the question of what I need the BSTs for, without getting into 
too many boring details it is to merge and sort IP blocklists, that is, 
large datasets of ranges in the form of (IP address, IP address, 
string). Originally I was also serializing them in a binary format (but 
no more after a redesign). I kept the "merge and sort" part as a helper 
script, but that is considerably simpler to implement.

Please note that I'm happy with my code, it works well. I intended to 
implement it in C all along, even before trying Python. The python code 
was a side thing for testing/curiosity/fun. It prompted my original 
question but that was really about Python and the standard library 
itself, and I don't wish to waste anyone's time.

As an anecdotal data point (honestly not trying to raise the "Python is 
slow" strawman), I implemented the same algorithm in C and Python, using 
pyavl. Round numbers were 4 mins vs 4 seconds, against Python (plus 
pyavl). Even considering I'm a worse Python programmer than C 
programmer, it's a lot. I know many will probably think I tried to do "C 
in Python" but that's not the case, at least I don' t think so. Anyway 
like I said, not really relevant to this discussion.




More information about the Python-list mailing list