Binary search tree

Scott SA pydev at
Mon Nov 12 10:17:28 CET 2007

On 11/12/07, Michel Albert (exhuma at wrote:

>On Nov 9, 11:45 pm, Bruno Desthuilliers
><bdesth.quelquech... at> wrote:
>> at a Ècrit :
>> > Hi,
>> > I have to get list of URLs one by one and to find the URLs that I have
>> > more than one time(can't be more than twice).
>> > I thought to put them into binary search tree, this way they'll be
>> > sorted and I'll be able to check if the URL already exist.
>> What about a set ?
>> s = set()
>> for url in urls:
>>    if url in s:
>>      print "already have ", url
>>    else:
>>      set.add(url)
>Interesting. For this case I usually used dicts. As in:
>d = {}
>for url in urls:
>   if url in d:
>      print "already have ", url
>   else:
>      d[url] = 1
>Now, I can see that this method has some superfluous data (the `1`
>that is assigned to the dict). So I suppose this is less memory
>efficient. But is this slower then? As both implementations use hashes
>of the URL to access the data. Just asking out of curiosity ;)

I'm pretty new at python but found this recipie in a cookbook (thanks O'Reilly for a useful book). I also found it online later, when the cookbook wasn't handy:

    ... google for more

It uses the 'setdefault' function and works _really_ well.


Basically looks like this:

    unique_urls = {}
    for url in urls:

    ['', '', '']

... creates a dict of None's*
    {'': None, '': None}

and "harvesting" is trivial:
    url_keys    = unique_urls.keys()

If you would find an index and count useful, you could do something like this:
    for idx in range(len(urls)):
   {'': [0], '': [1,2]}
later, the url's, their index values and counts could be found by:

    url_keys    = unique_urls.keys()
    for url_key in url_keys:
        url_idxs    = unique_urls[url_key]
        url_hits    = len(url_idxs)

Depending upon the data you may have associated with your URLs, or whatever else you're working with, this is a very sweet 'shortcut'.

I hope this helps,


* To be trully Python-esque, 'habits' would be employed and the value would have been 'Nun', but I guess the namesake can only be carried so far (^8

More information about the Python-list mailing list