No trees in the stdlib?

Miles Kaufmann milesck at umich.edu
Sat Jun 27 22:56:32 EDT 2009


João Valverde wrote:
> To answer the question of what I need the BSTs for, without getting  
> into too many boring details it is to merge and sort IP blocklists,  
> that is, large datasets of ranges in the form of (IP address, IP  
> address, string). Originally I was also serializing them in a binary  
> format (but no more after a redesign). I kept the "merge and sort"  
> part as a helper script, but that is considerably simpler to  
> implement.
>
> ...
>
> As an anecdotal data point (honestly not trying to raise the "Python  
> is slow" strawman), I implemented the same algorithm in C and  
> Python, using pyavl. Round numbers were 4 mins vs 4 seconds, against  
> Python (plus pyavl). Even considering I'm a worse Python programmer  
> than C programmer, it's a lot. I know many will probably think I  
> tried to do "C in Python" but that's not the case, at least I don' t  
> think so. Anyway like I said, not really relevant to this discussion.

What format were you using to represent the IP addresses?  (Is it a  
Python class?)  And why wouldn't you use a network address/subnet mask  
pair to represent block ranges?  (It seems like being able to  
represent ranges that don't fit into a subnet's 2^n block wouldn't be  
that common of an occurrence, and that it might be more useful to make  
those ranges easier to manipulate.)

One of the major disadvantages of using a tree container is that  
usually multiple comparisons must be done for every tree operation.   
When that comparison involves a call into Python bytecode (for custom  
cmp/lt methods) the cost can be substantial.  Compare that to Python's  
hash-based containers, which only need to call comparison methods in  
the event of hash collisions (and that's hash collisions, not hash  
table bucket collisions, since the containers cache each object's hash  
value).  I would imagine that tree-based containers would only be  
worth using with objects with comparison methods implemented in C.

Not that I'm trying to be an apologist, or reject your arguments; I  
can definitely see the use case for a well-implemented, fast tree- 
based container for Python.  And so much the better if, when you need  
one, there was a clear consensus about what package to use (like PIL  
for image manipulation--it won't meet every need, and there are others  
out there, but it's usually the first recommended), rather than having  
to search out and evaluate a dozen different ones.

-Miles




More information about the Python-list mailing list