[Python-Dev] binary trees. Review obmalloc.c

Vladimir 'Yu' Stepanov vys at renet.ru
Thu Apr 27 11:01:26 CEST 2006


Josiah Carlson wrote:
> "Vladimir 'Yu' Stepanov" <vys at renet.ru> wrote:
>   
>> Josiah Carlson wrote:
>>     
>>> There exists various C and Python implementations of both AVL and
>>> Red-Black trees.  For users of Python who want to use AVL and/or
>>> Red-Black trees, I would urge them to use the Python implementations. 
>>> In the case of *needing* the speed of a C extension, there already
>>> exists a CPython extension module for AVL trees:
>>>     http://www.python.org/pypi/pyavl/1.1
>>>
>>> I would suggest you look through some suggested SoC projects in the
>>> wiki:
>>>     http://wiki.python.org/moin/SummerOfCode
>>>
>>>  - Josiah
>>>
>>>   
>>>       
>> Thanks for the answer!
>>
>> I already saw pyavl-1.1. But for this reason I would like to see the module
>> in a standard package python. Functionality for pyavl and dict to compare
>> difficultly. Functionality of my module will differ from functionality dict
>> in the best party. I have lead some tests on for work with different types
>> both for a package pyavl-1.1, and for the prototype of own module. The 
>> script
>> of check is resulted in attached a file avl-timeit.py In files
>> timeit-result-*-*.txt results of this test. The first in the name of a file
>> means quantity of the added elements, the second - argument of a method
>> timeit. There it is visible, that in spite of the fact that the module 
>> xtree
>> is more combined in comparison with pyavl the module (for everyone again
>> inserted pair [the key, value], is created two elements: python object - 
>> pair,
>> and an internal element of a tree), even in this case it works a little bit
>> more quickly. Besides the module pyavl is unstable for work in multithread
>> appendices (look the attached file module-avl-is-thread-unsafe.py).
>>     
>
> I'm not concerned about the speed of the external AVL module, and I'm
> not concerned about the speed of trees in Python; so far people have
> found that dictionaries are generally sufficient.  More specifically,
> while new lambda syntaxes are presented almost weekly, I haven't heard
> anyone complain about Python's lack of a tree module in months.  As a
> result, I don't find the possible addition of any tree structure to the
> collections module as being a generally compelling addition.
>   
The object dict is irreplaceable in most cases, it so. I do not assume, 
that binary
trees can sometime replace standard `dict' object. Sorting of the list 
also is and
will be irreplaceable function.

But there is a number of problems which can be rather easily solved by means
of binary trees. Most likely for them there will be an alternative. But, 
as a rule,
it is less obvious to the programmer. And most likely realization will 
be not so
fast.

Probably that nobody mentioned necessity of binary trees. In my opinion 
it is
not a reason in an estimation of necessity of the given type.

To take for example PHP. The array and the dictionary there is realized 
as the
list. There is no even a hint on hash object.  Those users to whom was 
necessary
hash already have chosen perl/python/ruby/etc. The similar situation can 
arise
and concerning binary trees.
> Again, I believe that the existance of 3rd party extension modules which
> implement AVL and red-black trees, regardless of their lack of thread
> safety, slower than your module by a constant, or lack of particular
> functionality to be basically immaterial to this discussion.  In my mind,
> there are only two relevant items to this discussion:
>
> 1. Is having a tree implementation (AVL or Red-Black) important, and if
> so, is it important enough to include in the collections module?
> 2. Is a tree implementation important enough for google to pay for its
> inclusion, given that there exists pyavl and a collectionsmodule.c patch
> for a red-black tree implementation?
>
> Then again, you already have your own implementation of a tree module,
> and it seems as though you would like to be paid for this already-done
> work.  I don't know how google feels about such things, but you should
> remember to include this information in your application.
>
>   
I have looked a patch for collectionsmodule.c. In my opinion it is fine
realization of binary trees. If there is this class that to me there is 
no need to
create one more copy.

My persistence speaks a demand of this type of data. As to my message on
the beginning of the project for SoC I ask to consider that this theme 
has not
been lifted :) For me it was a minor question.
>> I think, that creation of this type (together with type of pair), will make
>> programming more convenient since sorting by function sort will be required
>> less often.
>>     
>
> How often are you sorting data?  I've found that very few of my
> operations involve sorting data of any kind, and even if I did have need
> of sorting, Python's sorting algorithm is quite fast, likely faster by
> nearly a factor of 2 (in the number of comparisons) than the on-line
> construction of an AVL or Red-Black tree.
>   
Really it is not frequent. Only when there is one of problems which is 
difficult for
solving by means of already available means, whether means it that I 
should look
aside C, C ++? It not seems to me that presence of structures of 
self-balanced binary
trees are distinctive feature of any language.

Comparison of functions of sorting and binary trees not absolutely 
correctly. I think
that function sort will lose considerably on greater lists. Especially 
after an insert or
removal of all one element.
>> I can probably borrow in this module beyond the framework of the project
>> google. The demand of such type of data is interesting. Because of 
>> necessity
>> of processing `gcmodule.c' and `obmalloc.c' this module cannot be realized
>> as the external module.
>>     
>
> It seems to me (after checking collectionsmodule.c and a few others)
> that proper C modules only seem to import Python.h and maybe
> structmember.h . If you are manually including gcmodule.c and obmalloc.c,
> you may be doing something wrong; I'll leave it to others to further
> comment on this aspect.
>   
Thanks that at once has not shot me :) I was not going to work directly 
neither with that,
nor with another. There were difficulties with translation :)

When I discussed obmalloc.c, I meant its internal code. Ways of its 
improvement:
  * To adapt allocation of blocks of memory with other alignment. Now 
alignment
    is rigidly set on 8 bytes. As a variant, it is possible to use 
alignment on 4 bytes.
    And this value can be set at start of the interpreter through
    arguments/variable environments/etc. At testing with alignment on 4 
or 8 bytes
    difference in speed of work was not appreciable.
  * To expand the maximal size which can be allocated by means of the 
given module.
    Now the maximal size is 256 bytes.
  * At the size of the allocated memory close to maximal, the allocation 
of blocks
    becomes inefficient. For example, for the sizes of blocks 248 and 
256 (blocksize),
    values of quantity of elements (PAGESIZE/blocksize) it is equal 16. 
I.e. it would be
    possible to use for the sizes of the block 248 same page, as for the 
size of the block 256.




More information about the Python-Dev mailing list