[Python-Dev] binary trees. Review obmalloc.c

Fri May 5 09:50:12 CEST 2006

Josiah Carlson wrote:
> "Vladimir 'Yu' Stepanov" <vys at renet.ru> wrote:
>
>   
>> Comparison of functions of sorting and binary trees not absolutely 
>> correctly. I think that function sort will lose considerably on
>> greater lists. Especially after an insert or removal of all one element.
>>     
>
> Generally speaking, people who understand at least some of Python's
> internals (list internals specifically), will not be *removing* entries
> from lists one at a time (at least not using list.pop(0) ), because that
> is slow.  If they need to remove items one at a time from the smallest
> to the largest, they will usually use list.reverse(), then repeatedly
> list.pop(), as this is quite fast (in general).
>   
Yes. I understood it when resulted a set example.
> However, as I just said, people usually don't remove items from
> just-sorted lists, they tend to iterate over them via 'for i in list:' .
>   
Such problem arises at creation of the list of timers. And this list is in
constant change: addition/removal of elements in the list.
collections.deque here does not approach, as if addition in the big list is
made or search of the nearest value on the average it is necessary to lead
quantity of checks N/2 is made. For a binary tree the quantity of necessary
checks on former is equal log2 (N).

Other variant of use of binary trees: search of concurrence to ranges. Such
ranges can be blocks IP of addresses. Also this kind of the dictionary can
be used for a fast finding, whether the given point enters into one of
pieces. These problems can be realized by means of binary search. For
binary search the same lacks, as for above resulted example are
characteristic: the insert and removal for lists are carried out slowly and
after an insert sorting of the list is required. Except for that function
of binary search is absent in standard package Python. It is possible to
write its analogue on pure Python, but it will be more than twice more
slowly.
>  - Josiah
>
> As an aside, I have personally implemented trees a few times for
> different reasons.  One of the issues I have with most tree
> implementations is that it is generally difficult to do things like
> "give me the kth smallest/largest item".  Of course the standard
> solution is what is generally called a "partial order" or "order
> statistic" tree, but then you end up with yet another field in your
> structure.  One nice thing about Red-Black trees combined with
> order-statistic trees, is that you can usually use the uppermost bit of
> the order statistics for red/black information.  Of course, this is
> really only interesting from an "implement this in C" perspective;
> because if one were implementing it in Python, one may as well be really
> lazy and not bother implementing guaranteed balanced trees and be
> satisfied with randomized-balancing via Treaps.
Here that I have found through Google on a line " order statistic binary 
tree ":

    http://www.cs.mcgill.ca/~cs251/OldCourses/1997/topic20/

I have understood, that similar I have already made something :). The 
result of
the test shows on the most bad result, but is certainly worse, than a usual
tree. Badly that up to the end I have not tested this realization. 
Percent on
90 I am assured that it is efficient.

There is a question. What API should be used? Therefore as if to address to
object, as to MAP to object __getitem__ will return one result. And if as to
the list - another.

Here the list of already existing attributes and methods:

    class xtree(__builtin__.object)
     |  X-Tree. Binary tree with AVL balance mechanism.
     |
     |  Methods defined here:
     |
     |  __contains__(...)
     |      x.__contains__(y) <==> y in x
     |
     |  __delitem__(...)
     |      x.__delitem__(y) <==> del x[y]
     |
     |  __getitem__(...)
     |      x.__getitem__(y) <==> x[y]
     |
     |  __iter__(...)
     |      x.__iter__() <==> iter(x)
     |
     |  __len__(...)
     |      x.__len__() <==> len(x)
     |
     |  __repr__(...)
     |      x.__repr__() <==> repr(x)
     |
     |  __setitem__(...)
     |      x.__setitem__(i, y) <==> x[i]=y
     |
     |  append(...)
     |      D.append((k: v)) -> None, append new pair element into tree 
with sorting.
     |      Return pair element if argument is exist.
     |
     |  clear(...)
     |      D.clear() -> None. Remove all items from D.
     |
     |  copy(...)
     |      D.copy() -> a shallow copy of D.
     |
     |  get(...)
     |      D.get(k[,d]) -> D[k] if k in D, else d. d defaults to None.
     |
     |  has_key(...)
     |      D.has_key(k) -> True if D has a key k, else False.
     |
     |  items(...)
     |      D.items() -> list of D's (key, value) pairs.
     |
     |  iteritems(...)
     |      D.iteritems() -> an iterator over the (key, value) items of D.
     |
     |  iterkeys(...)
     |      D.iterkeys() -> an iterator over the keys of D.
     |
     |  itervalues(...)
     |      D.itervalues() -> an iterator over the values of D.
     |
     |  keys(...)
     |      D.keys() -> list of D's keys.
     |
     |  pop(...)
     |      D.pop(k[,d]) -> v, remove specified key and return the 
corresponding value.
     |      If key is not found, d is returned if given, otherwise 
KeyError is raised.
     |
     |  popitem(...)
     |      D.popmin() -> (k, v), remove and return minimal (key, value) 
pair as a
     |      2-tuple; but raise KeyError if D is empty.
     |
     |  popmax(...)
     |      D.popmax() -> (k, v), remove and return maximal (key, value) 
pair as a
     |      2-tuple; but raise KeyError if D is empty.
     |
     |  popmin(...)
     |      D.popmin() -> (k, v), remove and return minimal (key, value) 
pair as a
     |      2-tuple; but raise KeyError if D is empty.
     |
     |  pushmax(...)
     |      D.pushmax(item) -> (k, v), remove and return maximal (key, 
value) pair as a
     |      2-tuple; but raise KeyError if D is empty.
     |
     |  pushmin(...)
     |      D.pushmin(item) -> (k, v), remove and return minimal (key, 
value) pair as a
     |      2-tuple; but raise KeyError if D is empty.
     |
     |  rebuild(...)
     |      D.rebuild() -> None. Take compare function for rebuild 
dictionary.
     |      It works without use of additional memory on a place.
     |
     |  setdefault(...)
     |      D.setdefault(k[,d]) -> D.get(k,d), also set D[k]=d if k not 
in D.
     |
     |  update(...)
     |      D.update(E) -> None.  Update D from E and F: for k in E: 
D[k] = E[k]
     |      (if E has keys else: for (k, v) in E: D[k] = v).
     |
     |  values(...)
     |      D.values() -> list of D's values.
     |
     |  
----------------------------------------------------------------------
     |  Data descriptors defined here:
     |
     |  cmp
     |      function of comparison of objects
     |
     |  freeze
     |      to block an opportunity change of object.
     |
     |  
----------------------------------------------------------------------
     |  Data and other attributes defined here:
     |
     |  __new__ = <built-in method __new__ of type object at 0x282a4aa0>
     |      T.__new__(S, ...) -> a new object with type S, a subtype of T

> Of course all of this discussion about trees in Python is fine, though
> there is one major problem.  With minimal work, one can construct 3
> values such that ((a < b < c) and (c < a)) is true.
>   
The same problem is inherent in the standard dictionary - dict (), only
roots at it, it is others. For example:

class test:
        def __hash__(self):
                return random.randint(0, 1000000)

The binary tree is very easy for tearing down. Therefore when any reference
to it is made, blocking either on reading, or on record should be
established. If it is opened iterator on the given object also blocking on
reading should be established where iterator will not come to the end or
while it will not be closed or destroyed. Thus it is possible to guarantee
security of object. Functions of blocking are similar on
pthread_rdlock_trylock, and pthread_wrlock_trylock. If blocking is
impossible, exception RDLockError, WRLockError, FreezeLockError is
generated.

Thanks.

--
Vladimir Stepanov
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: avl-timeit.py
Url: http://mail.python.org/pipermail/python-dev/attachments/20060505/6dd159df/attachment.asc 
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: test-normal.txt
Url: http://mail.python.org/pipermail/python-dev/attachments/20060505/6dd159df/attachment.txt 
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: test-order-statistic.txt
Url: http://mail.python.org/pipermail/python-dev/attachments/20060505/6dd159df/attachment-0001.txt