[Python-Dev] Google Summer of Code proposal: New class for work with binary trees AVL and RB as with the standard dictionary.
Vladimir 'Yu' Stepanov
vys at renet.ru
Mon Apr 24 11:51:58 CEST 2006
I would like to participate in Google Summer of Code. The idea consists in
creation of a specialized class for work with binary trees AVL and RB. The
part of ideas is taken from Parrot (Perl6) where for pair values the
specialized type is stipulated. As advantages it is possible to note:
* High speed. On simple types at quantity of elements up to 10000 speed of
work concedes to the standard dictionary of all in 2.7 times.
* Safety of work in a multithread operating mode.
* The method for frosts of object is stipulated.
* The data storage is carried out in the sorted kind.
* A high overall performance `item' and `iteritem' methods as it is not
necessary to create new objects.
* At lots of objects in a tree memory is used much less intensively.
* Absence of collisions. As consequence, it is impossible to generate bad
data for creation DoS of the attacks influencing the dictionary of
transferred arguments.
* For objects existence of a method __ hash __ is not necessary.
* The same kind of the dictionary by means of the overloaded operations
probably change of its behaviour so that it supported the
multivariational
representation based on stratification given and subsequent recoils.
Additional requirements to key objects:
* The opportunity of comparison of key objects function cmp is necessary.
There are the reasons, braking development of this project:
* Compulsion of heading of a module `gc' for each compound object (this
structure is unessential to some objects, but updating `gc' the module
is required).
* Absence of standard object the pair, probably depends on the
previous item.
Lacks of a binary tree:
* Average time of search for hash - O (1), and for trees - O (log2 N).
* A lot of memory under a single element of a tree:
(3*sizeof (void *) + sizeof (int))*2,
one element is used rather - the pair, the second site of memory is
allocated under node of a tree.
In protection of object "pair":
* The logic of methods of the given object noticeably is easier than
tuple,
that as a result can affect speed of work of the program in the best
party.
* Alignment on 8 bytes has no big sense at present architecture where in
cache sample a minimum on 64 bytes is made. Use of type "pair" will give
an appreciable prize at use of alignment in 4 bytes. Otherwise on 64-bit
platforms it is much more favourable to use tuple object.
The given project can demand processing of the module `gcmodule.c' and
`tupleobject.c'. It is necessary to reduce the size of static objects,
for this
purpose the opportunity is necessary is transparent to pass objects not
having
direct support from the module `gcmodule.c'.
Also it will be partially necessary to process the module `obmalloc.c'
for more
effective distribution of memory.
I shall be glad to answer questions on this theme.
More information about the Python-Dev
mailing list