No trees in the stdlib?
João Valverde
backup95 at netcabo.pt
Sat Jun 27 01:06:53 EDT 2009
João Valverde wrote:
> greg wrote:
>> João Valverde wrote:
>>
>>> What's lacking is an associative array that preserves ordering,
>>> doesn't require a hash function and has fast insertions and
>>> deletions in O(log(n)).
>>
>> Careful here -- you can't get away from the need for
>> hashability just by using a tree. Even if you don't
>> need to actually hash the values, it's still important
>> that the criterion for ordering between objects doesn't
>> change while they're in the tree, otherwise they'll
>> be in the wrong place and won't be found by subsequent
>> lookups.
>
> I'm aware :) Technically it's necessary to define a total ordering on
> the set of keys.
>>
>> > I'm genuinely surprised to know
>>> there are no data structures that efficiently support such a common
>>> need in Python.
>>
>> Is it really all that common? If it truly were common,
>> there probably *would* be something for it in the
>> stdlib by now.
> Obviously my experience differs, but those were my expectations.
>>
>> What sort of things are you doing that you want such
>> a structure for? Maybe we can suggest a way of using
>> the existing data structures to achieve the same
>> goal.
>>
> To answer the question of what I need the BSTs for, without getting
> into too many boring details it is to merge and sort IP blocklists,
> that is, large datasets of ranges in the form of (IP address, IP
> address, string). Originally I was also serializing them in a binary
> format (but no more after a redesign). I kept the "merge and sort"
> part as a helper script, but that is considerably simpler to implement.
Crap, this sentence is totally confusing. I meant kept the merge code as
a helper script and moved the rest to C, see next paragraph.
> Please note that I'm happy with my code, it works well. I intended to
> implement it in C all along (for a system daemon), even before trying
> Python. The python code was a side thing for testing/curiosity/fun. It
> prompted my original question but that was really about Python and the
> standard library itself, and I don't wish to waste anyone's time.
>
> As an anecdotal data point (honestly not trying to raise the "Python
> is slow" strawman), I implemented the same algorithm in C and Python,
> using pyavl. Round numbers were 4 mins vs 4 seconds, against Python
> (plus pyavl). Even considering I'm a worse Python programmer than C
> programmer, it's a lot. I know many will probably think I tried to do
> "C in Python" but that's not the case, at least I don' t think so.
> Anyway like I said, not really relevant to this discussion.
>
More information about the Python-list
mailing list