No trees in the stdlib?
tim.wintle at teamrubber.com
Mon Jun 29 14:01:19 CEST 2009
On Sat, 2009-06-27 at 06:03 +0100, João Valverde wrote:
> To answer the question of what I need the BSTs for, without getting
> into too many boring details it is to merge and sort IP blocklists,
> that is, large datasets of ranges in the form of (IP address, IP
> address, string).
> As an anecdotal data point (honestly not trying to raise the "Python
> is slow" strawman), I implemented the same algorithm in C and Python,
> using pyavl. Round numbers were 4 mins vs 4 seconds, against Python
> (plus pyavl).
Out of interest, I recently wrote something similar that imported (a
class of) snort rules for blacklisting ip traffic. I could only use the
I ended up writing a simple tree using dict-like objects . Afraid I
haven't got a taught CS background to know the name of the structure.
(note insertion wasn't the focus, and I didn't bother writing it to
handle updates/merges - this is a quick script I run every now and then,
so I'm sure it could be done better - I just liked having the standard
dict interface for each node)
I only had three levels of branching, using the first octet to branch at
the root node, the second octet to branch as the second node, and the
final two to branch at the third node's depth (since even then that's
normally sparse relative to the first two nodes).
It works well enough for me - I'm IO bound reading in ip addresses from
logs to check against the blacklist, and there is a fair bit of other
processing going on for each line.
(Obviously I converted the ip addresses to integers before doing all
this to avoid hashing strings etc)
(As rules could be for any subnet I overloaded some of the dict methods
to check against rules on unusual subnets etc. before checking
individual ips in the final part)
> Even considering I'm a worse Python programmer than C
> programmer, it's a lot. I know many will probably think I tried to do
> "C in Python" but that's not the case, at least I don' t think so.
> Anyway like I said, not really relevant to this discussion.
More information about the Python-list