Mailman 3 base64 module - Python-Dev

base64 module

Sanghyeon Seo

April 5, 2006

10:45 p.m.

Hello, base64 module documentation for b64decode function says, "TypeError is raised if s were incorrectly padded or if there are non-alphabet characters present in the string." But this doesn't seem to be the case. Testcase: import base64 base64.b64decode('%') Since % is a non-alphabet character, this should raise TypeError (btw, shouldn't this be ValueError instead?), but Python 2.4.3 silently ignores. I found this while experimenting with IronPython. IronPython 1.0 Beta 5 gives: Traceback (most recent call last): File base64, line unknown, in b64decode SystemError: cannot decode byte: '%' It's not TypeError, but it doesn't silently ignore either. Seo Sanghyeon

Show replies by date

Dino Viehland

April 2006

10:01 a.m.

New subject: [IronPython] base64 module

Well, CPython at least still enforces the padding, even if it's ignoring the invalid characters. Here's Seo's repro 'simplified' to go straight to binascii (just to get to the root API):

...

...
...
import binascii binascii.a2b_base64('%') ''

And then sending a valid character, invalid padding:

...

...
...
binascii.a2b_base64('A') Traceback (most recent call last): File "<stdin>", line 1, in ? binascii.Error: Incorrect padding

and then throwing in random invalid characters, and CPython ignores the invalid characters:

...

...
...
binascii.a2b_base64('ABC=') '\x00\x10' binascii.a2b_base64('%%ABC=') '\x00\x10' binascii.a2b_base64('%%ABC=%!@##') '\x00\x10' binascii.a2b_base64('%%ABC=%!@##*#*()') '\x00\x10'

The documentation for binascii.a2b_base64 doesn't specify if it throws for anything either. I would suspect that there's a reason why CPython is ignoring the invalid characters here. If this is the expected behavior then I'm happy to make IronPython match this. And at the very least we HAVE to fix the exception that gets thrown - I'm with Seo that it should be a ValueError but line between ValueError and TypeError is blurry at times anyway, and TypeError is what's documented. Do you want to help develop Dynamic languages on CLR? (http://members.microsoft.com/careers/search/details.aspx?JobID=6D4754DE-11F0...) -----Original Message----- From: users-bounces@lists.ironpython.com [mailto:users-bounces@lists.ironpython.com] On Behalf Of Sanghyeon Seo Sent: Wednesday, April 05, 2006 10:45 PM To: python-dev@python.org; users@lists.ironpython.com Subject: [IronPython] base64 module Hello, base64 module documentation for b64decode function says, "TypeError is raised if s were incorrectly padded or if there are non-alphabet characters present in the string." But this doesn't seem to be the case. Testcase: import base64 base64.b64decode('%') Since % is a non-alphabet character, this should raise TypeError (btw, shouldn't this be ValueError instead?), but Python 2.4.3 silently ignores. I found this while experimenting with IronPython. IronPython 1.0 Beta 5 gives: Traceback (most recent call last): File base64, line unknown, in b64decode SystemError: cannot decode byte: '%' It's not TypeError, but it doesn't silently ignore either. Seo Sanghyeon _______________________________________________ users mailing list users@lists.ironpython.com http://lists.ironpython.com/listinfo.cgi/users-ironpython.com

Vladimir 'Yu' Stepanov

2:51 a.m.

New subject: Google Summer of Code proposal: New class for work with binary trees AVL and RB as with the standard dictionary.

I would like to participate in Google Summer of Code. The idea consists in creation of a specialized class for work with binary trees AVL and RB. The part of ideas is taken from Parrot (Perl6) where for pair values the specialized type is stipulated. As advantages it is possible to note: * High speed. On simple types at quantity of elements up to 10000 speed of work concedes to the standard dictionary of all in 2.7 times. * Safety of work in a multithread operating mode. * The method for frosts of object is stipulated. * The data storage is carried out in the sorted kind. * A high overall performance `item' and `iteritem' methods as it is not necessary to create new objects. * At lots of objects in a tree memory is used much less intensively. * Absence of collisions. As consequence, it is impossible to generate bad data for creation DoS of the attacks influencing the dictionary of transferred arguments. * For objects existence of a method __ hash __ is not necessary. * The same kind of the dictionary by means of the overloaded operations probably change of its behaviour so that it supported the multivariational representation based on stratification given and subsequent recoils. Additional requirements to key objects: * The opportunity of comparison of key objects function cmp is necessary. There are the reasons, braking development of this project: * Compulsion of heading of a module `gc' for each compound object (this structure is unessential to some objects, but updating `gc' the module is required). * Absence of standard object the pair, probably depends on the previous item. Lacks of a binary tree: * Average time of search for hash - O (1), and for trees - O (log2 N). * A lot of memory under a single element of a tree: (3*sizeof (void *) + sizeof (int))*2, one element is used rather - the pair, the second site of memory is allocated under node of a tree. In protection of object "pair": * The logic of methods of the given object noticeably is easier than tuple, that as a result can affect speed of work of the program in the best party. * Alignment on 8 bytes has no big sense at present architecture where in cache sample a minimum on 64 bytes is made. Use of type "pair" will give an appreciable prize at use of alignment in 4 bytes. Otherwise on 64-bit platforms it is much more favourable to use tuple object. The given project can demand processing of the module `gcmodule.c' and `tupleobject.c'. It is necessary to reduce the size of static objects, for this purpose the opportunity is necessary is transparent to pass objects not having direct support from the module `gcmodule.c'. Also it will be partially necessary to process the module `obmalloc.c' for more effective distribution of memory. I shall be glad to answer questions on this theme.

Josiah Carlson

11:51 a.m.

New subject: Google Summer of Code proposal: New class for work with binary trees AVL and RB as with the standard dictionary.

There exists various C and Python implementations of both AVL and Red-Black trees. For users of Python who want to use AVL and/or Red-Black trees, I would urge them to use the Python implementations. In the case of *needing* the speed of a C extension, there already exists a CPython extension module for AVL trees: http://www.python.org/pypi/pyavl/1.1 I would suggest you look through some suggested SoC projects in the wiki: http://wiki.python.org/moin/SummerOfCode - Josiah "Vladimir 'Yu' Stepanov" <vys@renet.ru> wrote:

...

I would like to participate in Google Summer of Code. The idea consists in creation of a specialized class for work with binary trees AVL and RB. The part of ideas is taken from Parrot (Perl6) where for pair values the specialized type is stipulated. As advantages it is possible to note: * High speed. On simple types at quantity of elements up to 10000 speed of work concedes to the standard dictionary of all in 2.7 times. * Safety of work in a multithread operating mode. * The method for frosts of object is stipulated. * The data storage is carried out in the sorted kind. * A high overall performance `item' and `iteritem' methods as it is not necessary to create new objects. * At lots of objects in a tree memory is used much less intensively. * Absence of collisions. As consequence, it is impossible to generate bad data for creation DoS of the attacks influencing the dictionary of transferred arguments. * For objects existence of a method __ hash __ is not necessary. * The same kind of the dictionary by means of the overloaded operations probably change of its behaviour so that it supported the multivariational representation based on stratification given and subsequent recoils.

Additional requirements to key objects: * The opportunity of comparison of key objects function cmp is necessary.

There are the reasons, braking development of this project: * Compulsion of heading of a module `gc' for each compound object (this structure is unessential to some objects, but updating `gc' the module is required). * Absence of standard object the pair, probably depends on the previous item.

Lacks of a binary tree: * Average time of search for hash - O (1), and for trees - O (log2 N). * A lot of memory under a single element of a tree: (3*sizeof (void *) + sizeof (int))*2, one element is used rather - the pair, the second site of memory is allocated under node of a tree.

In protection of object "pair": * The logic of methods of the given object noticeably is easier than tuple, that as a result can affect speed of work of the program in the best party. * Alignment on 8 bytes has no big sense at present architecture where in cache sample a minimum on 64 bytes is made. Use of type "pair" will give an appreciable prize at use of alignment in 4 bytes. Otherwise on 64-bit platforms it is much more favourable to use tuple object.

The given project can demand processing of the module `gcmodule.c' and `tupleobject.c'. It is necessary to reduce the size of static objects, for this purpose the opportunity is necessary is transparent to pass objects not having direct support from the module `gcmodule.c'.

Also it will be partially necessary to process the module `obmalloc.c' for more effective distribution of memory.

I shall be glad to answer questions on this theme.

_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/jcarlson%40uci.edu

Hye-Shik Chang

8:06 p.m.

New subject: Google Summer of Code proposal: New class for work with binary trees AVL and RB as with the standard dictionary.

On 4/25/06, Josiah Carlson <jcarlson@uci.edu> wrote:

...

There exists various C and Python implementations of both AVL and Red-Black trees. For users of Python who want to use AVL and/or Red-Black trees, I would urge them to use the Python implementations. In the case of *needing* the speed of a C extension, there already exists a CPython extension module for AVL trees: http://www.python.org/pypi/pyavl/1.1

And a C implementation for redblack tree is here: http://python.org/sf/1324770 :) Hye-Shik

Vladimir 'Yu' Stepanov

12:15 a.m.

New subject: Google Summer of Code proposal: New class for work with binary trees AVL and RB as with the standard dictionary.

Josiah Carlson wrote:

...

There exists various C and Python implementations of both AVL and Red-Black trees. For users of Python who want to use AVL and/or Red-Black trees, I would urge them to use the Python implementations. In the case of *needing* the speed of a C extension, there already exists a CPython extension module for AVL trees: http://www.python.org/pypi/pyavl/1.1

I would suggest you look through some suggested SoC projects in the wiki: http://wiki.python.org/moin/SummerOfCode

- Josiah

Thanks for the answer! I already saw pyavl-1.1. But for this reason I would like to see the module in a standard package python. Functionality for pyavl and dict to compare difficultly. Functionality of my module will differ from functionality dict in the best party. I have lead some tests on for work with different types both for a package pyavl-1.1, and for the prototype of own module. The script of check is resulted in attached a file avl-timeit.py In files timeit-result-*-*.txt results of this test. The first in the name of a file means quantity of the added elements, the second - argument of a method timeit. There it is visible, that in spite of the fact that the module xtree is more combined in comparison with pyavl the module (for everyone again inserted pair [the key, value], is created two elements: python object - pair, and an internal element of a tree), even in this case it works a little bit more quickly. Besides the module pyavl is unstable for work in multithread appendices (look the attached file module-avl-is-thread-unsafe.py). I think, that creation of this type (together with type of pair), will make programming more convenient since sorting by function sort will be required less often. I can probably borrow in this module beyond the framework of the project google. The demand of such type of data is interesting. Because of necessity of processing `gcmodule.c' and `obmalloc.c' this module cannot be realized as the external module. #!/usr/local/bin/python # Initialize modules import avl import xtree import types import sys import timeit if len(sys.argv) != 3: iterations = 1000000 count = 10 else: iterations = int(sys.argv[1]) count = int(sys.argv[2]) print "Iterations", iterations print "Repeats", count print result_avl = timeit.Timer(""" import avl a = avl.new() for i in xrange(%d): a.insert(i) """ % iterations).timeit(count) print "object avl.new()", result_avl result_xtree = timeit.Timer(""" import xtree a = xtree.xtree() for i in xrange(%d): a[i] = True """ % iterations).timeit(count) print "object xtree.xtree()", result_xtree result_dict = timeit.Timer(""" import types a = dict() for i in xrange(%d): a[i] = True """ % iterations).timeit(count) print "object dict()", result_dict print " Dict Xtree AVL" print "Dict %.2f %.2f %.2f" % (float(result_dict)/result_dict, float(result_dict)/result_xtree, float(result_dict)/result_avl) print "Xtree %.2f %.2f %.2f" % (float(result_xtree)/result_dict, float(result_xtree)/result_xtree, float(result_xtree)/result_avl) print "AVL %.2f %.2f %.2f" % (float(result_avl)/result_dict, float(result_avl)/result_xtree, float(result_avl)/result_avl) fox:root!~/Downloads/Python/PYTHON/pyavl-1.1 > python setup.py install running install running build running build_ext building 'avl' extension creating build creating build/temp.freebsd-5.4-RELEASE-i386-2.4 cc -fno-strict-aliasing -DNDEBUG -O -pipe -D__wchar_t=wchar_t -DTHREAD_STACK_SIZE=0x20000 -O2 -fPIC -DHAVE_AVL_VERIFY -DAVL_FOR_PYTHON -I/usr/local/include/python2.4 -c avl.c -o build/temp.freebsd-5.4 -RELEASE-i386-2.4/avl.o -O2 -Wno-parentheses -Wno-uninitialized cc -fno-strict-aliasing -DNDEBUG -O -pipe -D__wchar_t=wchar_t -DTHREAD_STACK_SIZE=0x20000 -O2 -fPIC -DHAVE_AVL_VERIFY -DAVL_FOR_PYTHON -I/usr/local/include/python2.4 -c avlmodule.c -o build/temp.freeb sd-5.4-RELEASE-i386-2.4/avlmodule.o -O2 -Wno-parentheses -Wno-uninitialized creating build/lib.freebsd-5.4-RELEASE-i386-2.4 cc -shared -pthread -O2 build/temp.freebsd-5.4-RELEASE-i386-2.4/avl.o build/temp.freebsd-5.4-RELEASE -i386-2.4/avlmodule.o -o build/lib.freebsd-5.4-RELEASE-i386-2.4/avl.so -Wl,-x running install_lib copying build/lib.freebsd-5.4-RELEASE-i386-2.4/avl.so -> /usr/local/lib/python2.4/site-packages

Vladimir 'Yu' Stepanov

12:16 a.m.

New subject: Google Summer of Code proposal: New class for work with binary trees AVL and RB as with the standard dictionary.

Josiah Carlson wrote:

...

There exists various C and Python implementations of both AVL and Red-Black trees. For users of Python who want to use AVL and/or Red-Black trees, I would urge them to use the Python implementations. In the case of *needing* the speed of a C extension, there already exists a CPython extension module for AVL trees: http://www.python.org/pypi/pyavl/1.1

I would suggest you look through some suggested SoC projects in the wiki: http://wiki.python.org/moin/SummerOfCode

- Josiah

Josiah Carlson

11:18 a.m.

New subject: Google Summer of Code proposal: New class for work with binary trees AVL and RB as with the standard dictionary.

"Vladimir 'Yu' Stepanov" <vys@renet.ru> wrote:

...

Josiah Carlson wrote:

...
There exists various C and Python implementations of both AVL and Red-Black trees. For users of Python who want to use AVL and/or Red-Black trees, I would urge them to use the Python implementations. In the case of *needing* the speed of a C extension, there already exists a CPython extension module for AVL trees: http://www.python.org/pypi/pyavl/1.1

I would suggest you look through some suggested SoC projects in the wiki: http://wiki.python.org/moin/SummerOfCode

- Josiah

Thanks for the answer!

I already saw pyavl-1.1. But for this reason I would like to see the module in a standard package python. Functionality for pyavl and dict to compare difficultly. Functionality of my module will differ from functionality dict in the best party. I have lead some tests on for work with different types both for a package pyavl-1.1, and for the prototype of own module. The script of check is resulted in attached a file avl-timeit.py In files timeit-result-*-*.txt results of this test. The first in the name of a file means quantity of the added elements, the second - argument of a method timeit. There it is visible, that in spite of the fact that the module xtree is more combined in comparison with pyavl the module (for everyone again inserted pair [the key, value], is created two elements: python object - pair, and an internal element of a tree), even in this case it works a little bit more quickly. Besides the module pyavl is unstable for work in multithread appendices (look the attached file module-avl-is-thread-unsafe.py).

I'm not concerned about the speed of the external AVL module, and I'm not concerned about the speed of trees in Python; so far people have found that dictionaries are generally sufficient. More specifically, while new lambda syntaxes are presented almost weekly, I haven't heard anyone complain about Python's lack of a tree module in months. As a result, I don't find the possible addition of any tree structure to the collections module as being a generally compelling addition. Again, I believe that the existance of 3rd party extension modules which implement AVL and red-black trees, regardless of their lack of thread safety, slower than your module by a constant, or lack of particular functionality to be basically immaterial to this discussion. In my mind, there are only two relevant items to this discussion: 1. Is having a tree implementation (AVL or Red-Black) important, and if so, is it important enough to include in the collections module? 2. Is a tree implementation important enough for google to pay for its inclusion, given that there exists pyavl and a collectionsmodule.c patch for a red-black tree implementation? Then again, you already have your own implementation of a tree module, and it seems as though you would like to be paid for this already-done work. I don't know how google feels about such things, but you should remember to include this information in your application.

...

I think, that creation of this type (together with type of pair), will make programming more convenient since sorting by function sort will be required less often.

How often are you sorting data? I've found that very few of my operations involve sorting data of any kind, and even if I did have need of sorting, Python's sorting algorithm is quite fast, likely faster by nearly a factor of 2 (in the number of comparisons) than the on-line construction of an AVL or Red-Black tree.

...

I can probably borrow in this module beyond the framework of the project google. The demand of such type of data is interesting. Because of necessity of processing `gcmodule.c' and `obmalloc.c' this module cannot be realized as the external module.

It seems to me (after checking collectionsmodule.c and a few others) that proper C modules only seem to import Python.h and maybe structmember.h . If you are manually including gcmodule.c and obmalloc.c, you may be doing something wrong; I'll leave it to others to further comment on this aspect. - Josiah

Terry Reedy

3:05 p.m.

New subject: Google Summer of Code proposal: New class for workwith binary trees AVL and RB as with the standard dictionary.

"Josiah Carlson" <jcarlson@uci.edu> wrote in message news:20060426094148.66EE.JCARLSON@uci.edu...

...

Then again, you already have your own implementation of a tree module, and it seems as though you would like to be paid for this already-done work. I don't know how google feels about such things,

They are explicitly against that in the student FAQ. They are paying for new code writen after the project is approved and before the endpoint.

...

but you should remember to include this information in your application.

Existing code can legitimately be used to show feasibility of the project and competancy of the student Terry Jan Reedy

Vladimir 'Yu' Stepanov

2:01 a.m.

New subject: binary trees. Review obmalloc.c

Josiah Carlson wrote:

...

"Vladimir 'Yu' Stepanov" <vys@renet.ru> wrote:

...
Josiah Carlson wrote:

...
There exists various C and Python implementations of both AVL and Red-Black trees. For users of Python who want to use AVL and/or Red-Black trees, I would urge them to use the Python implementations. In the case of *needing* the speed of a C extension, there already exists a CPython extension module for AVL trees: http://www.python.org/pypi/pyavl/1.1

I would suggest you look through some suggested SoC projects in the wiki: http://wiki.python.org/moin/SummerOfCode

- Josiah

Thanks for the answer!

I already saw pyavl-1.1. But for this reason I would like to see the module in a standard package python. Functionality for pyavl and dict to compare difficultly. Functionality of my module will differ from functionality dict in the best party. I have lead some tests on for work with different types both for a package pyavl-1.1, and for the prototype of own module. The script of check is resulted in attached a file avl-timeit.py In files timeit-result-*-*.txt results of this test. The first in the name of a file means quantity of the added elements, the second - argument of a method timeit. There it is visible, that in spite of the fact that the module xtree is more combined in comparison with pyavl the module (for everyone again inserted pair [the key, value], is created two elements: python object - pair, and an internal element of a tree), even in this case it works a little bit more quickly. Besides the module pyavl is unstable for work in multithread appendices (look the attached file module-avl-is-thread-unsafe.py).

I'm not concerned about the speed of the external AVL module, and I'm not concerned about the speed of trees in Python; so far people have found that dictionaries are generally sufficient. More specifically, while new lambda syntaxes are presented almost weekly, I haven't heard anyone complain about Python's lack of a tree module in months. As a result, I don't find the possible addition of any tree structure to the collections module as being a generally compelling addition.

The object dict is irreplaceable in most cases, it so. I do not assume, that binary trees can sometime replace standard `dict' object. Sorting of the list also is and will be irreplaceable function. But there is a number of problems which can be rather easily solved by means of binary trees. Most likely for them there will be an alternative. But, as a rule, it is less obvious to the programmer. And most likely realization will be not so fast. Probably that nobody mentioned necessity of binary trees. In my opinion it is not a reason in an estimation of necessity of the given type. To take for example PHP. The array and the dictionary there is realized as the list. There is no even a hint on hash object. Those users to whom was necessary hash already have chosen perl/python/ruby/etc. The similar situation can arise and concerning binary trees.

...

Again, I believe that the existance of 3rd party extension modules which implement AVL and red-black trees, regardless of their lack of thread safety, slower than your module by a constant, or lack of particular functionality to be basically immaterial to this discussion. In my mind, there are only two relevant items to this discussion:

1. Is having a tree implementation (AVL or Red-Black) important, and if so, is it important enough to include in the collections module? 2. Is a tree implementation important enough for google to pay for its inclusion, given that there exists pyavl and a collectionsmodule.c patch for a red-black tree implementation?

Then again, you already have your own implementation of a tree module, and it seems as though you would like to be paid for this already-done work. I don't know how google feels about such things, but you should remember to include this information in your application.

I have looked a patch for collectionsmodule.c. In my opinion it is fine realization of binary trees. If there is this class that to me there is no need to create one more copy. My persistence speaks a demand of this type of data. As to my message on the beginning of the project for SoC I ask to consider that this theme has not been lifted :) For me it was a minor question.

...

...
I think, that creation of this type (together with type of pair), will make programming more convenient since sorting by function sort will be required less often.

How often are you sorting data? I've found that very few of my operations involve sorting data of any kind, and even if I did have need of sorting, Python's sorting algorithm is quite fast, likely faster by nearly a factor of 2 (in the number of comparisons) than the on-line construction of an AVL or Red-Black tree.

Really it is not frequent. Only when there is one of problems which is difficult for solving by means of already available means, whether means it that I should look aside C, C ++? It not seems to me that presence of structures of self-balanced binary trees are distinctive feature of any language. Comparison of functions of sorting and binary trees not absolutely correctly. I think that function sort will lose considerably on greater lists. Especially after an insert or removal of all one element.

...

...
I can probably borrow in this module beyond the framework of the project google. The demand of such type of data is interesting. Because of necessity of processing `gcmodule.c' and `obmalloc.c' this module cannot be realized as the external module.

It seems to me (after checking collectionsmodule.c and a few others) that proper C modules only seem to import Python.h and maybe structmember.h . If you are manually including gcmodule.c and obmalloc.c, you may be doing something wrong; I'll leave it to others to further comment on this aspect.

Thanks that at once has not shot me :) I was not going to work directly neither with that, nor with another. There were difficulties with translation :) When I discussed obmalloc.c, I meant its internal code. Ways of its improvement: * To adapt allocation of blocks of memory with other alignment. Now alignment is rigidly set on 8 bytes. As a variant, it is possible to use alignment on 4 bytes. And this value can be set at start of the interpreter through arguments/variable environments/etc. At testing with alignment on 4 or 8 bytes difference in speed of work was not appreciable. * To expand the maximal size which can be allocated by means of the given module. Now the maximal size is 256 bytes. * At the size of the allocated memory close to maximal, the allocation of blocks becomes inefficient. For example, for the sizes of blocks 248 and 256 (blocksize), values of quantity of elements (PAGESIZE/blocksize) it is equal 16. I.e. it would be possible to use for the sizes of the block 248 same page, as for the size of the block 256.

"Martin v. Löwis"

3:19 p.m.

New subject: binary trees. Review obmalloc.c

Vladimir 'Yu' Stepanov wrote:

...

* To adapt allocation of blocks of memory with other alignment. Now alignment is rigidly set on 8 bytes. As a variant, it is possible to use alignment on 4 bytes. And this value can be set at start of the interpreter through arguments/variable environments/etc. At testing with alignment on 4 or 8 bytes difference in speed of work was not appreciable.

That depends on the hardware you use, of course. Some architectures absolutely cannot stand mis-aligned accesses, and programs will just crash if they try to perform such accesses. So Python should err on the safe side, and only use a smaller alignment when it is known not to hurt. OTOH, I don't see the *advantage* in reducing the alignment.

...

* To expand the maximal size which can be allocated by means of the given module. Now the maximal size is 256 bytes.

Right. This is somewhat deliberate, though; the expectation is that fragmentation will increase dramatically if even larger size classes are supported.

...

* At the size of the allocated memory close to maximal, the allocation of blocks becomes inefficient. For example, for the sizesof blocks 248 and 256 (blocksize), values of quantity of elements (PAGESIZE/blocksize) it is equal 16. I.e. it would be possible to use for the sizes of the block 248 same page, as for the size of the block 256.

Good idea; that shouldn't be too difficult to implement: for sizes above 215, pools need to be spaced apart only at 16 bytes. Regards, Martin

Tim Peters

9:17 p.m.

New subject: binary trees. Review obmalloc.c

[Vladimir 'Yu' Stepanov]

...

...
* To adapt allocation of blocks of memory with other alignment. Now alignment is rigidly set on 8 bytes. As a variant, it is possible to use alignment on 4 bytes. And this value can be set at start of the interpreter through arguments/variable environments/etc. At testing with alignment on 4 or 8 bytes difference in speed of work was not appreciable.

[Martin v. Löwis]

...

That depends on the hardware you use, of course. Some architectures absolutely cannot stand mis-aligned accesses, and programs will just crash if they try to perform such accesses.

Note that we _had_ to add a goofy "long double" to the PyGC_Head union because some Python platform required 8-byte alignment for some types ... see rev 25454. Any spelling of malloc() also needs to return memory aligned for any legitimate purpose, and 8-byte alignment is the strictest requirement we know of across current Python platforms.

...

So Python should err on the safe side, and only use a smaller alignment when it is known not to hurt.

OTOH, I don't see the *advantage* in reducing the alignment.

It could cut wasted bytes. There is no per-object memory overhead in a release-build obmalloc: the allocatable part of a pool is entirely used for user-visible object memory, except when the alignment requirement ends up forcing unused (by both the user and by obmalloc) pad bytes. For example, Python ints consume 12 bytes on 32-bit boxes, but if space were allocated for them by obmalloc (it's not), obmalloc would have to use 16 bytes per int to preserve 8-byte alignment. OTOH, obmalloc (unlike PyGC_Head) has always used 8-byte alignment, because that's what worked best for Vladimir Marangozov during his extensive timing tests. It's not an isolated decision -- e.g., it's also influenced by, and influences, "the best" pool size, and (of course) cutting alignment to 4 would double the number of "size classes" obmalloc has to juggle.

...

...
* To expand the maximal size which can be allocated by means of the given module. Now the maximal size is 256 bytes.

...

Right. This is somewhat deliberate, though; the expectation is that fragmentation will increase dramatically if even larger size classes are supported.

It's entirely deliberate ;-) obmalloc is no way trying to be a general-purpose allocator. It was designed specifically for the common memory patterns in Python, aiming at large numbers of small objects of the same size. That's extremely common in Python, and allocations larger than 256 bytes are not. The maximum size was actually 64 bytes at first. After that, dicts grew an embedded 8-element table, which vastly increased the size of the dict struct. obmalloc's limit was boosted to 256 then, although it could have stopped at the size of a dict (in the rough ballpark of 150 bytes). There was no reason (then or now) to go beyond 256.

...

...
* At the size of the allocated memory close to maximal, the allocation of blocks becomes inefficient. For example, for the sizesof blocks 248 and 256 (blocksize), values of quantity of elements (PAGESIZE/blocksize) it is equal 16. I.e. it would be possible to use for the sizes of the block 248 same page, as for the size of the block 256.

...

Good idea; that shouldn't be too difficult to implement: for sizes above 215, pools need to be spaced apart only at 16 bytes.

I'd rather drop the limit to 215 then <0.3 wink>. Allocations that large probably still aren't important to obmalloc, but it is important that finding a requested allocation's "size index" be as cheap as possible. Uniformity helps that. On modern boxes, it may be desirable to boost POOL_SIZE -- nobody has run timing experiments as comprehensive as Vladimir ran way back when, and there's no particular reason to believe that the same set of tradeoffs would look best on current architectures.

Vladimir 'Yu' Stepanov

May 2006

9:29 a.m.

New subject: binary trees. Review obmalloc.c

Tim Peters wrote:

...

[Vladimir 'Yu' Stepanov]

...
...
* To adapt allocation of blocks of memory with other alignment. Now alignment is rigidly set on 8 bytes. As a variant, it is possible to use alignment on 4 bytes. And this value can be set at start of the interpreter through arguments/variable environments/etc. At testing with alignment on 4 or 8 bytes difference in speed of work was not appreciable.

[Martin v. Löwis]

...
That depends on the hardware you use, of course. Some architectures absolutely cannot stand mis-aligned accesses, and programs will just crash if they try to perform such accesses.

Note that we _had_ to add a goofy "long double" to the PyGC_Head union because some Python platform required 8-byte alignment for some types ... see rev 25454. Any spelling of malloc() also needs to return memory aligned for any legitimate purpose, and 8-byte alignment is the strictest requirement we know of across current Python platforms. It is possible to receive the reference about these tests? I have lead some tests, which very small difference of speed of work at alignment on 4 and 8 byte (a maximum of 8%, and in the basic less than one percent).

In tests consecutive search of elements in one piece of memory was used. I understand, that it is necessary to lead still the test with a casual choice of addresses to minimize optimization of work cache the processor. And certainly not all platforms will show the same result (I shall try to not forget to attach a source code of the test and its result of work :) ). On the other hand reduction of a memory size by each separate copy of object is capable to increase speed by the same percent that is lost at change of displacement about 8 bytes up to 4 :) Besides begins to refuse probably superstructures on allocation of memory at some objects, since int.

...

...
So Python should err on the safe side, and only use a smaller alignment when it is known not to hurt.

OTOH, I don't see the *advantage* in reducing the alignment.

It could cut wasted bytes. There is no per-object memory overhead in a release-build obmalloc: the allocatable part of a pool is entirely used for user-visible object memory, except when the alignment requirement ends up forcing unused (by both the user and by obmalloc) pad bytes. For example, Python ints consume 12 bytes on 32-bit boxes, but if space were allocated for them by obmalloc (it's not), obmalloc would have to use 16 bytes per int to preserve 8-byte alignment. Good note.

...

OTOH, obmalloc (unlike PyGC_Head) has always used 8-byte alignment, because that's what worked best for Vladimir Marangozov during his extensive timing tests. It's not an isolated decision -- e.g., it's also influenced by, and influences, "the best" pool size, and (of course) cutting alignment to 4 would double the number of "size classes" obmalloc has to juggle. Yes, the maximum quantity of buffers will really be doubled. But it should not to affect as that productivity of system, or on total amount of consumed memory. Change of a fragmentation of memory to count it is impossible, since it will depend on the concrete application.

...

...
...
* To expand the maximal size which can be allocated by means of the given module. Now the maximal size is 256 bytes.

...
Right. This is somewhat deliberate, though; the expectation is that fragmentation will increase dramatically if even larger size classes are supported.

It's entirely deliberate ;-) obmalloc is no way trying to be a general-purpose allocator. It was designed specifically for the common memory patterns in Python, aiming at large numbers of small objects of the same size. That's extremely common in Python, and allocations larger than 256 bytes are not. The maximum size was actually 64 bytes at first. After that, dicts grew an embedded 8-element table, which vastly increased the size of the dict struct. obmalloc's limit was boosted to 256 then, although it could have stopped at the size of a dict (in the rough ballpark of 150 bytes). There was no reason (then or now) to go beyond 256. See below.

...
...
* At the size of the allocated memory close to maximal, the allocation of blocks becomes inefficient. For example, for the sizesof blocks 248 and 256 (blocksize), values of quantity of elements (PAGESIZE/blocksize) it is equal 16. I.e. it would be possible to use for the sizes of the block 248 same page, as for the size of the block 256.

...
Good idea; that shouldn't be too difficult to implement: for sizes above 215, pools need to be spaced apart only at 16 bytes.

I'd rather drop the limit to 215 then <0.3 wink>. Allocations that large probably still aren't important to obmalloc, but it is important that finding a requested allocation's "size index" be as cheap as possible. Uniformity helps that. An available module on allocation of memory really does not approach for overall aims. And speed of work can "blink" in case of existing non-uniform realization on allocation of memory. In some cases allocation of memory in the module `obmalloc.c' work as function of a wrapper for standard malloc/free, that does not raise speed :) .

...

On modern boxes, it may be desirable to boost POOL_SIZE -- nobody has run timing experiments as comprehensive as Vladimir ran way back when, and there's no particular reason to believe that the same set of tradeoffs would look best on current architectures. As to speed of work it will be not less than in available systems of allocation of memory. Naturally, there are no pluses without what that of minuses. In my opinion available minuses can be made insignificant. Language Python has obviously expressed specificity on allocation of memory

For an example we shall assume, that two pieces of memory are allocated. One is allocated `obmalloc.c'. Another is allocated standard malloc. Both of a piece are small enough to be realized by similar methods. In a number of OS malloc will lose in speed of work for multi-thread programs, because of blocking the global lock on allocation of memory. Certainly it concerns not to all operational systems. that it is possible to use favourably, having developed system of allocation of memory is direct under it. The properties inherent in language Python: * At work in multi-thread a mode already there is blocking GIL which can be used and in case of with allocation of memory. * Objects which are often used, it is better to adapt on the real size, instead of on log2(size(ob)) therefore as function realloc is carried out extremely seldom or even less often :) . * Extremely seldom there are objects the size of the allocated memory multiple 512, 1024, 2048 since before them heading Python of object, and the parameters describing properties of object is placed still. First two items concern and to present `obmalloc.c', but for a narrow circle of allocated blocks of memory. In some interpreters own system of allocation of memory is used. Adjustment of a control system of memory not the latest step to acceleration Python. Thanks everyone who has answered. The theme on former is interesting. #include <stdio.h> #include <unistd.h> #include <sys/types.h> #include <sys/time.h> #include <sys/resource.h> typedef union __genlink { struct { union __genlink *qcl_prev; union __genlink *qcl_next; } qcl; /* Queue of Cycle List */ void *point[2]; union __genlink *link[2]; } genlink; void qcl_init(genlink *root) { root->qcl.qcl_next = root; root->qcl.qcl_prev = root; } void qcl_insert_after(genlink *root, genlink *p) { genlink *next; p->qcl.qcl_prev = root; p->qcl.qcl_next = (next = root->qcl.qcl_next); root->qcl.qcl_next = p; next->qcl.qcl_prev = p; } #define COUNT 10000000 #define REPEAT 1000 long long rusage_print(const char *s, struct rusage *oru) { struct rusage ru; int sec; int usec; long long ret; getrusage(RUSAGE_SELF, &ru); sec = ru.ru_utime.tv_sec - oru->ru_utime.tv_sec; if ((usec = ru.ru_utime.tv_usec - oru->ru_utime.tv_usec) < 0) { sec--; usec = 1000000 + usec; } printf("%s: user time %d.%6.6d\n", s, sec, usec); fflush(stdout); ret = sec*1000000; ret += usec; return ret; } main() { genlink root; genlink *p; genlink *base; int i; int k; struct rusage ru; long long tm_normal; long long tm_align4; long long tm_align2; if ((base = malloc((COUNT + 1)*sizeof(genlink))) == NULL) { fprintf(stderr, "malloc failed\n"); exit(1); } bzero(base, (COUNT + 1)*sizeof(genlink)); getrusage(RUSAGE_SELF, &ru); bzero(base, (COUNT + 1)*sizeof(genlink)); qcl_init(&root); for (i = 0; i < COUNT; i++) { p = (genlink *)(((char *)(&base[i])) + sizeof(genlink)); // add compenstion. qcl_insert_after(&root, p); } for (k = 0, p = &root; k < REPEAT; k++) for (i = 0; i < COUNT; i++) p = p->qcl.qcl_next; tm_normal = rusage_print("align for two pointers", &ru); bzero(base, (COUNT + 1)*sizeof(genlink)); getrusage(RUSAGE_SELF, &ru); bzero(base, (COUNT + 1)*sizeof(genlink)); qcl_init(&root); for (i = 0; i < COUNT; i++) { p = (genlink *)(((char *)(&base[i])) + 4); qcl_insert_after(&root, p); } for (k = 0, p = &root; k < REPEAT; k++) for (i = 0; i < COUNT; i++) p = p->qcl.qcl_next; tm_align4 = rusage_print("align for 4 bytes", &ru); bzero(base, (COUNT + 1)*sizeof(genlink)); getrusage(RUSAGE_SELF, &ru); bzero(base, (COUNT + 1)*sizeof(genlink)); qcl_init(&root); for (i = 0; i < COUNT; i++) { p = (genlink *)(((char *)(&base[i])) + 2); qcl_insert_after(&root, p); } for (k = 0, p = &root; k < REPEAT; k++) for (i = 0; i < COUNT; i++) p = p->qcl.qcl_next; tm_align2 = rusage_print("align for 2 bytes", &ru); printf(" Normal Align4 Align2\n"); printf("Normal %8.2f %8.2f %8.2f\n", ((double)tm_normal*100)/tm_normal, ((double)tm_normal*100)/tm_align4, ((double)tm_normal*100)/tm_align2 ); printf("Align4 %8.2f %8.2f %8.2f\n", ((double)tm_align4*100)/tm_normal, ((double)tm_align4*100)/tm_align4, ((double)tm_align4*100)/tm_align2 ); printf("Align2 %8.2f %8.2f %8.2f\n", ((double)tm_align2*100)/tm_normal, ((double)tm_align2*100)/tm_align4, ((double)tm_align2*100)/tm_align2 ); } At once: fox:root!~/test > cc aligntest.c; a.out ----------------------------------------- align for two pointers: user time 42.819181 align for 4 bytes: user time 42.725775 align for 2 bytes: user time 59.198894 Normal Align4 Align2 Normal 100.00 100.22 72.33 Align4 99.78 100.00 72.17 Align2 138.25 138.56 100.00 ------------------------------------------ At second: fox:root!~/test > cc -O2 aligntest.c; a.out ------------------------------------------ align for two pointers: user time 10.152062 align for 4 bytes: user time 10.161598 align for 2 bytes: user time 10.145807 Normal Align4 Align2 Normal 100.00 99.91 100.06 Align4 100.09 100.00 100.16 Align2 99.94 99.84 100.00 ------------------------------------------ XXX:vys!~ > a.out align for two pointers: user time 0.228938 align for 4 bytes: user time 0.237644 align for 2 bytes: user time 0.235519 Normal Align4 Align2 Normal 100.00 96.34 97.21 Align4 103.80 100.00 100.90 Align2 102.87 99.11 100.00 XXX:vys!~ > a.out align for two pointers: user time 0.254978 align for 4 bytes: user time 0.236627 align for 2 bytes: user time 0.242168 Normal Align4 Align2 Normal 100.00 107.76 105.29 Align4 92.80 100.00 97.71 Align2 94.98 102.34 100.00 XXX:vys!~ >

Josiah Carlson

1:25 a.m.

New subject: binary trees. Review obmalloc.c

"Vladimir 'Yu' Stepanov" <vys@renet.ru> wrote:

...

Comparison of functions of sorting and binary trees not absolutely correctly. I think that function sort will lose considerably on greater lists. Especially after an insert or removal of all one element.

Generally speaking, people who understand at least some of Python's internals (list internals specifically), will not be *removing* entries from lists one at a time (at least not using list.pop(0) ), because that is slow. If they need to remove items one at a time from the smallest to the largest, they will usually use list.reverse(), then repeatedly list.pop(), as this is quite fast (in general). However, as I just said, people usually don't remove items from just-sorted lists, they tend to iterate over them via 'for i in list:' . - Josiah As an aside, I have personally implemented trees a few times for different reasons. One of the issues I have with most tree implementations is that it is generally difficult to do things like "give me the kth smallest/largest item". Of course the standard solution is what is generally called a "partial order" or "order statistic" tree, but then you end up with yet another field in your structure. One nice thing about Red-Black trees combined with order-statistic trees, is that you can usually use the uppermost bit of the order statistics for red/black information. Of course, this is really only interesting from an "implement this in C" perspective; because if one were implementing it in Python, one may as well be really lazy and not bother implementing guaranteed balanced trees and be satisfied with randomized-balancing via Treaps. Of course all of this discussion about trees in Python is fine, though there is one major problem. With minimal work, one can construct 3 values such that ((a < b < c) and (c < a)) is true.

Vladimir 'Yu' Stepanov

12:50 a.m.

New subject: binary trees. Review obmalloc.c

Josiah Carlson wrote:

...

"Vladimir 'Yu' Stepanov" <vys@renet.ru> wrote:

...
Comparison of functions of sorting and binary trees not absolutely correctly. I think that function sort will lose considerably on greater lists. Especially after an insert or removal of all one element.

Generally speaking, people who understand at least some of Python's internals (list internals specifically), will not be *removing* entries from lists one at a time (at least not using list.pop(0) ), because that is slow. If they need to remove items one at a time from the smallest to the largest, they will usually use list.reverse(), then repeatedly list.pop(), as this is quite fast (in general).

Yes. I understood it when resulted a set example.

...

However, as I just said, people usually don't remove items from just-sorted lists, they tend to iterate over them via 'for i in list:' .

...

- Josiah

As an aside, I have personally implemented trees a few times for different reasons. One of the issues I have with most tree implementations is that it is generally difficult to do things like "give me the kth smallest/largest item". Of course the standard solution is what is generally called a "partial order" or "order statistic" tree, but then you end up with yet another field in your structure. One nice thing about Red-Black trees combined with order-statistic trees, is that you can usually use the uppermost bit of the order statistics for red/black information. Of course, this is really only interesting from an "implement this in C" perspective; because if one were implementing it in Python, one may as well be really lazy and not bother implementing guaranteed balanced trees and be satisfied with randomized-balancing via Treaps. Here that I have found through Google on a line " order statistic binary

Such problem arises at creation of the list of timers. And this list is in constant change: addition/removal of elements in the list. collections.deque here does not approach, as if addition in the big list is made or search of the nearest value on the average it is necessary to lead quantity of checks N/2 is made. For a binary tree the quantity of necessary checks on former is equal log2 (N). Other variant of use of binary trees: search of concurrence to ranges. Such ranges can be blocks IP of addresses. Also this kind of the dictionary can be used for a fast finding, whether the given point enters into one of pieces. These problems can be realized by means of binary search. For binary search the same lacks, as for above resulted example are characteristic: the insert and removal for lists are carried out slowly and after an insert sorting of the list is required. Except for that function of binary search is absent in standard package Python. It is possible to write its analogue on pure Python, but it will be more than twice more slowly. tree ": http://www.cs.mcgill.ca/~cs251/OldCourses/1997/topic20/ I have understood, that similar I have already made something :). The result of the test shows on the most bad result, but is certainly worse, than a usual tree. Badly that up to the end I have not tested this realization. Percent on 90 I am assured that it is efficient. There is a question. What API should be used? Therefore as if to address to object, as to MAP to object __getitem__ will return one result. And if as to the list - another. Here the list of already existing attributes and methods: class xtree(__builtin__.object) | X-Tree. Binary tree with AVL balance mechanism. | | Methods defined here: | | __contains__(...) | x.__contains__(y) <==> y in x | | __delitem__(...) | x.__delitem__(y) <==> del x[y] | | __getitem__(...) | x.__getitem__(y) <==> x[y] | | __iter__(...) | x.__iter__() <==> iter(x) | | __len__(...) | x.__len__() <==> len(x) | | __repr__(...) | x.__repr__() <==> repr(x) | | __setitem__(...) | x.__setitem__(i, y) <==> x[i]=y | | append(...) | D.append((k: v)) -> None, append new pair element into tree with sorting. | Return pair element if argument is exist. | | clear(...) | D.clear() -> None. Remove all items from D. | | copy(...) | D.copy() -> a shallow copy of D. | | get(...) | D.get(k[,d]) -> D[k] if k in D, else d. d defaults to None. | | has_key(...) | D.has_key(k) -> True if D has a key k, else False. | | items(...) | D.items() -> list of D's (key, value) pairs. | | iteritems(...) | D.iteritems() -> an iterator over the (key, value) items of D. | | iterkeys(...) | D.iterkeys() -> an iterator over the keys of D. | | itervalues(...) | D.itervalues() -> an iterator over the values of D. | | keys(...) | D.keys() -> list of D's keys. | | pop(...) | D.pop(k[,d]) -> v, remove specified key and return the corresponding value. | If key is not found, d is returned if given, otherwise KeyError is raised. | | popitem(...) | D.popmin() -> (k, v), remove and return minimal (key, value) pair as a | 2-tuple; but raise KeyError if D is empty. | | popmax(...) | D.popmax() -> (k, v), remove and return maximal (key, value) pair as a | 2-tuple; but raise KeyError if D is empty. | | popmin(...) | D.popmin() -> (k, v), remove and return minimal (key, value) pair as a | 2-tuple; but raise KeyError if D is empty. | | pushmax(...) | D.pushmax(item) -> (k, v), remove and return maximal (key, value) pair as a | 2-tuple; but raise KeyError if D is empty. | | pushmin(...) | D.pushmin(item) -> (k, v), remove and return minimal (key, value) pair as a | 2-tuple; but raise KeyError if D is empty. | | rebuild(...) | D.rebuild() -> None. Take compare function for rebuild dictionary. | It works without use of additional memory on a place. | | setdefault(...) | D.setdefault(k[,d]) -> D.get(k,d), also set D[k]=d if k not in D. | | update(...) | D.update(E) -> None. Update D from E and F: for k in E: D[k] = E[k] | (if E has keys else: for (k, v) in E: D[k] = v). | | values(...) | D.values() -> list of D's values. | | ---------------------------------------------------------------------- | Data descriptors defined here: | | cmp | function of comparison of objects | | freeze | to block an opportunity change of object. | | ---------------------------------------------------------------------- | Data and other attributes defined here: | | __new__ = <built-in method __new__ of type object at 0x282a4aa0> | T.__new__(S, ...) -> a new object with type S, a subtype of T

...

Of course all of this discussion about trees in Python is fine, though there is one major problem. With minimal work, one can construct 3 values such that ((a < b < c) and (c < a)) is true.

The same problem is inherent in the standard dictionary - dict (), only roots at it, it is others. For example: class test: def __hash__(self): return random.randint(0, 1000000) The binary tree is very easy for tearing down. Therefore when any reference to it is made, blocking either on reading, or on record should be established. If it is opened iterator on the given object also blocking on reading should be established where iterator will not come to the end or while it will not be closed or destroyed. Thus it is possible to guarantee security of object. Functions of blocking are similar on pthread_rdlock_trylock, and pthread_wrlock_trylock. If blocking is impossible, exception RDLockError, WRLockError, FreezeLockError is generated. Thanks. -- Vladimir Stepanov #!/usr/local/bin/python # Initialize modules import avl import xtree import collections import types import sys import timeit if len(sys.argv) != 3: iterations = 1000000 count = 10 else: iterations = int(sys.argv[1]) count = int(sys.argv[2]) print "Iterations", iterations print "Repeats", count print result_avl = timeit.Timer(""" import avl a = avl.new() for i in xrange(%d): a.insert(i) """ % iterations).timeit(count) print "object avl.new()", result_avl result_xtree = timeit.Timer(""" import xtree a = xtree.xtree() for i in xrange(%d): a[i] = i """ % iterations).timeit(count) print "object xtree.xtree()", result_xtree result_rbtree = timeit.Timer(""" import collections a = collections.rbtree() for i in xrange(%d): a[i] = i """ % iterations).timeit(count) print "object collections.rbtree()", result_rbtree result_dict = timeit.Timer(""" import types a = dict() for i in xrange(%d): a[i] = i """ % iterations).timeit(count) print "object dict()", result_dict print " Dict Rbtree Xtree AVL" print "Dict %.2f %.2f %.2f %.2f" % ( float(result_dict)/result_dict, float(result_dict)/result_rbtree, float(result_dict)/result_xtree, float(result_dict)/result_avl) print "Rbtree %.2f %.2f %.2f %.2f" % ( float(result_rbtree)/result_dict, float(result_rbtree)/result_rbtree, float(result_rbtree)/result_xtree, float(result_rbtree)/result_avl) print "Xtree %.2f %.2f %.2f %.2f" % ( float(result_xtree)/result_dict, float(result_xtree)/result_rbtree, float(result_xtree)/result_xtree, float(result_xtree)/result_avl) print "AVL %.2f %.2f %.2f %.2f" % ( float(result_avl)/result_dict, float(result_avl)/result_rbtree, float(result_avl)/result_xtree, float(result_avl)/result_avl) Iterations 1000000 Repeats 10 object avl.new() 34.5930659771 object xtree.xtree() 16.2341389656 object collections.rbtree() 79.2434329987 object dict() 4.01597499847 Dict Rbtree Xtree AVL Dict 1.00 0.05 0.25 0.12 Rbtree 19.73 1.00 4.88 2.29 Xtree 4.04 0.20 1.00 0.47 AVL 8.61 0.44 2.13 1.00 Iterations 1000000 Repeats 10 object avl.new() 35.8164830208 object xtree.xtree() 18.5456049442 object collections.rbtree() 74.5128099918 object dict() 4.09129810333 Dict Rbtree Xtree AVL Dict 1.00 0.05 0.22 0.11 Rbtree 18.21 1.00 4.02 2.08 Xtree 4.53 0.25 1.00 0.52 AVL 8.75 0.48 1.93 1.00

Josiah Carlson

11:27 a.m.

New subject: binary trees. Review obmalloc.c

"Vladimir 'Yu' Stepanov" <vys@renet.ru> wrote:

...

Josiah Carlson wrote:

...
However, as I just said, people usually don't remove items from just-sorted lists, they tend to iterate over them via 'for i in list:' .

Such problem arises at creation of the list of timers.

I've never seen that particular use-case. Please understand something. I believe that trees can be useful in some cases. However, I don't believe that they are generally useful enough in Python, given that we already have key,value dictionaries and sortable lists. They become even less worthwhile in Python, given the total ordering problem that I describe later in this message.

...

Except for that function of binary search is absent in standard package Python.

Binary search exists in the standard library as the bisect module.

...

Here that I have found through Google on a line " order statistic binary tree ":

http://www.cs.mcgill.ca/~cs251/OldCourses/1997/topic20/

Yes, that link does describe what an 'order statistic tree' is. One thing to note is that you can add order statistics to any tree wih one more field on each tree node. That is, you can have your AVL or Red-Black tree, and add order statistics to it. Allowing tree['x'] to return the associated value for the key 'x' (the same as a dictionary), as well as offer the ability to do tree.select(10) to get the 10th (or 11th if one counts from 0) smallest key (or key,value), or get the 'rank' of a key with tree.rank('x').

...

...
Of course all of this discussion about trees in Python is fine, though there is one major problem. With minimal work, one can construct 3 values such that ((a < b < c) and (c < a)) is true.

The same problem is inherent in the standard dictionary - dict (), only roots at it, it is others. For example:

This problem has nothing to do with dictionaries and hashing, it has to do with the fact that there may not be a total ordering on the elements of a sequence.

...

...
...
'b' < (0,) < u'a' < 'b' True x = ['b', (0,), u'a'] random.shuffle(x) x.sort() x [u'a', 'b', (0,)] random.shuffle(x) x.sort() x ['b', (0,), u'a']

What effect does this have on trees? Imagine, for a moment, that your tree looked like: (0,) / \ 'b' u'a' Then you added sufficient data so that your tree was rebalanced to be something like: u'a' / \ (0,) (stuff) / 'b' If you were searching for 'b', you would never find it, because in your search, you would compare 'b' against u'a', and take the right branch, where 'b' isn't. - Josiah

Vladimir 'Yu' Stepanov

2:10 a.m.

New subject: binary trees.

Josiah Carlson wrote:

...

"Vladimir 'Yu' Stepanov" <vys@renet.ru> wrote:

...
Josiah Carlson wrote:

...
However, as I just said, people usually don't remove items from just-sorted lists, they tend to iterate over them via 'for i in list:' .

Such problem arises at creation of the list of timers.

I've never seen that particular use-case.

Please understand something. I believe that trees can be useful in some cases. However, I don't believe that they are generally useful enough in Python, given that we already have key,value dictionaries and sortable lists. They become even less worthwhile in Python, given the total ordering problem that I describe later in this message.

I tiresome. It so :) Sorry for that.

...

...
Here that I have found through Google on a line " order statistic binary tree ":

http://www.cs.mcgill.ca/~cs251/OldCourses/1997/topic20/

Yes, that link does describe what an 'order statistic tree' is. One thing to note is that you can add order statistics to any tree wih one more field on each tree node. That is, you can have your AVL or Red-Black tree, and add order statistics to it. Allowing tree['x'] to return the associated value for the key 'x' (the same as a dictionary), as well as offer the ability to do tree.select(10) to get the 10th (or 11th if one counts from 0) smallest key (or key,value), or get the 'rank' of a key with tree.rank('x').

Thanks.

...

This problem has nothing to do with dictionaries and hashing, it has to do with the fact that there may not be a total ordering on the elements of a sequence.

It is sad. I did not know it. Therefore and have not understood. I have looked in Google on "python dev total ordering". On intentions to correct this problem I have not found anything. It already earlier was discussed ? -- Vladimir Stepanov

Josiah Carlson

3:12 a.m.

New subject: binary trees.

"Vladimir 'Yu' Stepanov" <vys@renet.ru> wrote:

...

Josiah Carlson wrote:

...
This problem has nothing to do with dictionaries and hashing, it has to do with the fact that there may not be a total ordering on the elements of a sequence.

It is sad. I did not know it. Therefore and have not understood.

I have looked in Google on "python dev total ordering". On intentions to correct this problem I have not found anything. It already earlier was discussed ?

Not really. It has to do with how non-comparable instances compare to each other. Generally speaking, when instances aren't comparable, Python compares their type, which is actually comparing the names of types. This wouldn't be a big deal, except that:

...

...
...
str < tuple < unicode True

And you can actually compare str and unicode, so, if you have a str that is greater than the unicode, you run into this issue. With unicode becoming str in Py3k, we may not run into this issue much then, unless bytes are comparable to str, in which case we end up witht the same problems. Actually, according to my google of "python dev total ordering", it gives me... http://mail.python.org/pipermail/python-dev/2003-March/034169.html http://mail.python.org/pipermail/python-list/2003-March/154142.html Which were earlier discussions on this topic, which are quite topical. The ultimate solution is to choose a total ordering on types and consider the problem solved. Then list.sort() and binary trees will work properly. - Josiah

Vladimir Yu. Stepanov

1:59 p.m.

New subject: total ordering.

On Sat, May 06, 2006 at 03:12:11AM -0700, Josiah Carlson wrote:

...

"Vladimir 'Yu' Stepanov" <vys@renet.ru> wrote:

...
Josiah Carlson wrote:

...
This problem has nothing to do with dictionaries and hashing, it has to do with the fact that there may not be a total ordering on the elements of a sequence.

It is sad. I did not know it. Therefore and have not understood.

I have looked in Google on "python dev total ordering". On intentions to correct this problem I have not found anything. It already earlier was discussed ?

Not really. It has to do with how non-comparable instances compare to each other. Generally speaking, when instances aren't comparable, Python compares their type, which is actually comparing the names of types. This wouldn't be a big deal, except that:

...
...
...
str < tuple < unicode True

And you can actually compare str and unicode, so, if you have a str that is greater than the unicode, you run into this issue. With unicode becoming str in Py3k, we may not run into this issue much then, unless bytes are comparable to str, in which case we end up witht the same problems.

Actually, according to my google of "python dev total ordering", it gives me...

http://mail.python.org/pipermail/python-dev/2003-March/034169.html http://mail.python.org/pipermail/python-list/2003-March/154142.html

Which were earlier discussions on this topic, which are quite topical. The ultimate solution is to choose a total ordering on types and consider the problem solved. Then list.sort() and binary trees will work properly.

Thanks for so detailed answer. Under these references discussion is conducted is presented in the form of "so is", instead of "as it to correct". Therefore I would like to mention a question "as it to correct". In case of incomparability of types can be it is meaningful compare the identifiers compared to each concrete type. More truly on them to calculate weight which will play a direct role in case of impossibility of comparison of types. Below one of variants of the decision of this problem is resulted. To each type of data in Python to add three fields: .. int tp_id; int tp_qualifier; int tp_weight; .. Rigidly to appoint some built in and external to types identifiers (tp_id). The others receive free identifiers on their initialization. Except for that types should be classified on their basic properties - tp_qualifier. The type can be classified as: 0 = None 1 = integer (bool, int, long, etc..) 2 = float (decimal, float, etc..) 3 = math (complex, matrix may be ?? ) 4 = string (str, unicode, splice, etc..) 5 = sequence (iterators, deque, xrange, enumerate, etc..) 6 = array (tuple, list, bytes, array, etc..) 7 = dictionary (dict, defdict, etc..) 8 = set (set, etc..) 9 = file (file, socket.socket, cStringIO.StringIO) .. 127 = non qualifier (sre.*, datetime.*, ClassType, IntsenceType, etc..) It is the approximate list on which it will be convenient to classify types (in my opinion). The weight can be precisely set in structure or if be equal -1 should is calculated under the formula: ob_type->tp_weight = (ob_type->tp_qualifier<<24) + (ob_type->tp_id<<8) Thus objects with similar properties will follow one after another. A problem can make external types. But if to develop further classification, problems to arise should not. -- SY. Vladimir Stepanov

Guido van Rossum

8:25 a.m.

New subject: total ordering.

On 5/6/06, Vladimir Yu. Stepanov <root@renet.ru> wrote: [proposing a total ordering between types] It Ain't Gonna Happen. (From now on, I'll write this as IAGH.) In Python 3000, we'll actually *remove* ordering between arbitrary types as a feature; only types that explicitly care to be ordered with respect to one another will be ordered. Equality tests are unaffected, x==y will simply return False if x and y are of incomparable types; but x<y (etc.) will raise an exception. -- --Guido van Rossum (home page: http://www.python.org/~guido/)

Vladimir 'Yu' Stepanov

5:04 a.m.

New subject: total ordering.

Guido van Rossum wrote:

...

On 5/6/06, Vladimir Yu. Stepanov <root@renet.ru> wrote: [proposing a total ordering between types]

It Ain't Gonna Happen. (From now on, I'll write this as IAGH.)

In Python 3000, we'll actually *remove* ordering between arbitrary types as a feature; only types that explicitly care to be ordered with respect to one another will be ordered. Equality tests are unaffected, x==y will simply return False if x and y are of incomparable types; but x<y (etc.) will raise an exception.

-- --Guido van Rossum (home page: http://www.python.org/~guido/) _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/vys%40renet.ru

When comparison is made like ("something" < 123), generating of exception is necessary. At sorting or use of binary trees it is not so obvious. The behavior function of comparison in this case depends on needs of the user. At present time use of redefined function is complicated. I shall give an example. By a call of a method sort for the list can give absolutely different exceptions, depending on the order of sorted values or data (thanks for David Mertz and Josiah Carlson): -----------------------------------------------------------------

...

...
...
[chr(255),1j,1,u't'].sort() Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: no ordering relation is defined for complex numbers [chr(255),1j,u't',1].sort() Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 0: ordinal not in range(128)

If for Python-3000 similar it will be shown concerning types str(), int(), complex() and so on, and the type of exceptions will strongly vary, it will make problematic redefinition of behavior of function of sorting. It would be quite good to create one more class of exceptions which would unify generation of mistakes at comparison of non-comparable types or data.

Guido van Rossum

7:36 a.m.

New subject: total ordering.

On 5/11/06, Vladimir 'Yu' Stepanov <vys@renet.ru> wrote:

...

If for Python-3000 similar it will be shown concerning types str(), int(), complex() and so on, and the type of exceptions will strongly vary, it will make problematic redefinition of behavior of function of sorting.

Not really. We'll just document that sort() should only be used on a list of objects that implement a total ordering. The behavior otherwise will simply be undefined; it will raise whatever exception is first raised by an unsupported comparison (most likely TypeError). In practice this won't be a problem. -- --Guido van Rossum (home page: http://www.python.org/~guido/)

Edward Loper

9:06 a.m.

New subject: total ordering.

Guido van Rossum wrote:

...

On 5/11/06, Vladimir 'Yu' Stepanov <vys@renet.ru> wrote:

...
If for Python-3000 similar it will be shown concerning types str(), int(), complex() and so on, and the type of exceptions will strongly vary, it will make problematic redefinition of behavior of function of sorting.

Not really. We'll just document that sort() should only be used on a list of objects that implement a total ordering. [...]

It might be useful in some cases to have a keyword argument to sort/sorted that says to ignore exceptions arising from comparing elements, and leaves the ordering of non-comparable values undefined. -Edward

Vladimir 'Yu' Stepanov

9:11 a.m.

New subject: total ordering.

Guido van Rossum wrote:

...

On 5/11/06, Vladimir 'Yu' Stepanov <vys@renet.ru> wrote:

...
If for Python-3000 similar it will be shown concerning types str(), int(), complex() and so on, and the type of exceptions will strongly vary, it will make problematic redefinition of behavior of function of sorting.

Not really. We'll just document that sort() should only be used on a list of objects that implement a total ordering. The behavior otherwise will simply be undefined; it will raise whatever exception is first raised by an unsupported comparison (most likely TypeError). In practice this won't be a problem.

In my opinion for functions of comparisons there is no additional kind of exceptions. If made action is not operation of reduction of type more often exception TypeError means a mistake of programming. At creation of exception, characteristic only for comparison, it is possible to involve methods `.__r(eq|ne|le|lt|ge|gt|cmp)__()' not being afraid to hide a mistake of programming TypeError (sorry for a tautology). It will be possible it conveniently to use as exception of management by a stream, for indication of necessity to involve `.__r(eq|ne|le|lt|ge|gt|cmp)__()' a method. This kind of a class can carry out function, similarly to StopIteration for `.next()'. In case of unsuccessful comparison the user has an opportunity in function `cmp' a method .sort() simple image to define the behaviour necessary to it, including an opportunity of sorting of diverse elements. At present time similar function is carried out with exception NotImplemented. This exception is generated in a number of mathematical operations. For this reason I ask to consider an opportunity of creation of a new class.

Jason Orendorff

10:16 a.m.

New subject: total ordering.

On 5/11/06, Vladimir 'Yu' Stepanov <vys@renet.ru> wrote:

...

If for Python-3000 similar it will be shown concerning types str(), int(), complex() and so on, and the type of exceptions will strongly vary, it will make problematic redefinition of behavior of function of sorting.

I don't see what you mean by "redefinition of behavior of function of sorting". Is this something a Python programmer might want to do? Can you give an example? On 5/16/06, Vladimir 'Yu' Stepanov <vys@renet.ru> wrote:

...

It will be possible it conveniently to use as exception of management by a stream, for indication of necessity to involve `.__r(eq|ne|le|lt|ge|gt|cmp)__()' a method. This kind of a class can carry out function, similarly to StopIteration for `.next()'.

There are no .__r(eq|ne|le|lt|ge|gt|cmp)__() methods, for a logical reason which you might enjoy deducing yourself...

...

At present time similar function is carried out with exception NotImplemented. This exception is generated in a number of mathematical operations. For this reason I ask to consider an opportunity of creation of a new class.

Can you explain this? NotImplemented isn't an exception. (NotImplementedError is, but that's something quite different.) NotImplemented has exactly one purpose in Python, as far as I can tell. What mathematical operations do you mean? -j

Vladimir 'Yu' Stepanov

5:06 a.m.

New subject: total ordering.

Jason Orendorff wrote:

...

On 5/11/06, Vladimir 'Yu' Stepanov <vys@renet.ru> wrote:

...
If for Python-3000 similar it will be shown concerning types str(), int(), complex() and so on, and the type of exceptions will strongly vary, it will make problematic redefinition of behavior of function of sorting.

I don't see what you mean by "redefinition of behavior of function of sorting". Is this something a Python programmer might want to do? Can you give an example?

It can be shown concerning external types. When everyone them we shall compare to any internal type, but not with each other. It concerns functions and the methods realized through CPython API. For example types of the whole greater numbers of different mathematical libraries. Or in case of use of several different representation of character arrays. One more example of work with a database which is more realistic. For Python-3000 it will be possible to use comparison only compatible types. There can be a problem to sort types of data incompatible for comparison. ------------------------------------------------------------ conn = MySQLdb.connect(...) cu = conn.cursor() cu.execute(""" CREATE TABLE bigtable ( id INT NOT NULL AUTO_INCREMENT, group_id INT NULL, value VARCHAR(255) NOT NULL, PRIMARY KEY (id) )""") for query in ( "INSERT INTO bigtable VALUES (NULL, 50, 'i am lazy')", "INSERT INTO bigtable VALUES (NULL, 20, 'i am very lazy')", "INSERT INTO bigtable VALUES (NULL, 19, 'i am very very lazy')", "INSERT INTO bigtable VALUES (NULL, 40, 'i am very very very lazy :)')", "INSERT INTO bigtable VALUES (NULL, NULL, 'Something different')" ): cu.execute(query) cu.execute("SELECT * FROM bigtable ORDER BY id") lst = [ r for r in cu ] ... do something in given order ... lst.sort(cmp=lambda a, b: cmp(a[1], b[1])) ------------------------------------------------------------ If it is required to sort records on a column group_id it hardly will be possible to us for python-3000 as numbers in a column group_id will alternate with None values. Certainly it is possible to filter the list preliminary. But it will not be effective. Especially, if to assume, that sorted values can have some various incompatible types (it already beyond the framework of the given example). The essence of my offer consists in at first to admit check of types due to fast mechanisms CPython of methods, and only in case of incompatibility of types of data to use spare model of comparison, only if it is necessary. Creation of uniform exception should facilitate creation of a design try-except. Class of exceptions TypeError have wide using. For functions and methods of comparison the specialized class is necessary. In fact nobody uses class StopIteration for generation of messages on a error :) Other example when classes of different modules are used, and it is necessary to sort them (see attached "example1.py").

...

On 5/16/06, Vladimir 'Yu' Stepanov <vys@renet.ru> wrote:

...
It will be possible it conveniently to use as exception of management by a stream, for indication of necessity to involve `.__r(eq|ne|le|lt|ge|gt|cmp)__()' a method. This kind of a class can carry out function, similarly to StopIteration for `.next()'.

There are no .__r(eq|ne|le|lt|ge|gt|cmp)__() methods, for a logical reason which you might enjoy deducing yourself...

Oops :) I has simply hastened. About __rcmp__ has once read through, and only here has emerged in memory when about it already all to think have forgotten. As inopportunely.

...

...
At present time similar function is carried out with exception NotImplemented. This exception is generated in a number of mathematical operations. For this reason I ask to consider an opportunity of creation of a new class.

Can you explain this? NotImplemented isn't an exception. (NotImplementedError is, but that's something quite different.)

I was mistaken. Sorry. It is valid not a class of exceptions.

...

NotImplemented has exactly one purpose in Python, as far as I can tell. What mathematical operations do you mean?

At multiplication/division/addition/subtraction/binary of operation/etc - if types are non-comparable (binary_op1 in abstract.c). Very universal element. CPython API will transform returned element NotImplemented to generation of exception TypeError. Whether it is good? In my opinion thus it is possible to hide a mistake of programming. As an example the attached file example2.py. Thanks. #!/usr/local/bin/python import types class CompareError: pass # class in module1 class A: uniform_global_sort_criterion = 1 def __init__(self, val): self.val = val def __cmp__(self, other): if type(self) is type(other) and self.__class__ == other.__class__: return cmp(self.val, other.val) raise CompareError def __repr__(self): return "%s(%d)" % (self.__class__.__name__, self.val) # class in module2 class B: uniform_global_sort_criterion = 2 def __init__(self, value): self.value = value def __cmp__(self, other): if type(self) is type(other) and self.__class__ == other.__class__: return cmp(self.value, other.value) raise CompareError def __repr__(self): return "%s(%d)" % (self.__class__.__name__, self.value) lst = [ A(50), B(25), A(75), B(100), A(1) ] def compare_not_compatable(a, b): try: return cmp(a, b) except CompareError: return cmp(a.uniform_global_sort_criterion, b.uniform_global_sort_criterion) lst.sort(compare_not_compatable) print lst #!/usr/local/bin/python import types import operator import sys _total_ordering_of_types = TOOT = { types.NoneType: 0, types.BooleanType: 100, types.IntType: 101, types.LongType: 102, types.FloatType: 103, types.ComplexType: 104, types.InstanceType: 105, types.StringType: 200, types.UnicodeType: 201, } def compare_realpart(a, b): try: # BEGIN: Emulation python3000 if type(a) is not type(b) and ( not operator.isNumberType(a) or not operator.isNumberType(b) ): raise TypeError("python3000: not-comparable types", (a,b)) # END: Emulation python3000 return cmp(a, b) except TypeError: # for complex and None try: # BEGIN: Emulation python3000 a2 = getattr(a, 'real', a) b2 = getattr(b, 'real', b) if type(a2) is not type(b2) and ( not operator.isNumberType(a2) or not operator.isNumberType(b2) ): raise TypeError("python3000: not-comparable types", (a2,b2)) # END: Emulation python3000 return cmp(getattr(a, 'real', a), getattr(b, 'real', b)) except TypeError: # for None value return cmp(TOOT[type(a)], TOOT[type(b)]) class proxy2number: def __init__(self, val): self.val = val def __cmp__(self, other): if type(other) is types.InstanceType and issubclass(other.__class__, self.__class__): return compare_realpart(self.val, other.val) self._check_type(other) return compare_realpart(self.val, other) def __str__(self): return "%s(%s)" % (self.__class__.__name__, self.val) __repr__ = __str__ # If it is forgotten @staticmethod that exception TypeError is generated #@staticmethod def _check_type(val): if not operator.isNumberType(val): raise TypeError("compare error") # Okay lst = [ proxy2number(10), 1, 18, proxy2number(11+1j), 10+1.2j, 13, -91, -12+100j, None, 17 ] lst.sort(compare_realpart) print lst

Jason Orendorff

12:34 p.m.

New subject: total ordering.

Vladimir, Your examples seem to indicate that you've misunderstood the change that's proposed for Python 3000. Especially this: On 5/17/06, Vladimir 'Yu' Stepanov <vys@renet.ru> wrote:

...

# BEGIN: Emulation python3000 if type(a) is not type(b) and ( not operator.isNumberType(a) or not operator.isNumberType(b) ): raise TypeError("python3000: not-comparable types", (a,b)) # END: Emulation python3000

Python 3000 will not do anything like this. It'll try a.__cmp__(b), and failing that b.__cmp__(a) (but imagine this using tp_ slots instead of actual Python method calls), and if both return NotImplemented, it'll throw a TypeError (rather than guess, which is what it does now). There's a lot of legacy oddness in PyObject_RichCompare() and its many helper functions; presumably they'll delete some of that, but it won't be anything you care about. Comparison with None should also continue to work as it does now, unless I missed something. -j

Tim Peters

6:28 p.m.

New subject: binary trees.

[Josiah Carlson]

...

...

...
...
...
str < tuple < unicode True

And you can actually compare str and unicode, so, if you have a str that is greater than the unicode, you run into this issue.

Oh dear -- I didn't realize we still had holes like that:

...

...
...
'b' < () < u'a' < 'b' True

We used to have a bunch of those with numeric types (like int < list < long), and then hacked comparison to artificially lump all "the numeric types" together (by pretending that their type names are the empty string -- see default_3way_compare() in object.c: /* different type: compare type names; numbers are smaller */ if (PyNumber_Check(v)) vname = ""; ). That ensures, e.g., that int < list and long < list, despite that "int" < "list" < "long".

...

With unicode becoming str in Py3k, we may not run into this issue much then, unless bytes are comparable to str, in which case we end up with the same problems.

If Guido thinks about it in time :-), I expect it's more likely that P3K will never fall back to comparing by type name; that non-equality comparisons that currently fall back to looking at the type name will raise exceptions instead. That's already the case for the types most recently added to Python, like

...

...
...
{} < set() Traceback (most recent call last): File "<stdin>", line 1, in ? TypeError: can only compare to a set import datetime {} < datetime.datetime.now() Traceback (most recent call last): File "<stdin>", line 1, in ? TypeError: can't compare datetime.datetime to dict {} == set() False {} == datetime.datetime.now() False

Vladimir 'Yu' Stepanov

5:30 a.m.

New subject: binary trees.

Josiah Carlson wrote:

...

And you can actually compare str and unicode, so, if you have a str that is greater than the unicode, you run into this issue. With unicode becoming str in Py3k, we may not run into this issue much then, unless bytes are comparable to str, in which case we end up witht the same problems.

Actually, according to my google of "python dev total ordering", it gives me...

http://mail.python.org/pipermail/python-dev/2003-March/034169.html http://mail.python.org/pipermail/python-list/2003-March/154142.html

Which were earlier discussions on this topic, which are quite topical. The ultimate solution is to choose a total ordering on types and consider the problem solved. Then list.sort(

Under the second reference there was a question. complex it can be partially comparable with int, long, float? In fact 1 == 1+0j?

"Martin v. Löwis"

11:20 p.m.

New subject: binary trees. Review obmalloc.c

Vladimir 'Yu' Stepanov wrote:

...

Yes. I understood it when resulted a set example.

...
However, as I just said, people usually don't remove items from just-sorted lists, they tend to iterate over them via 'for i in list:' .

Such problem arises at creation of the list of timers.

For a list of timers/timeout events, a priority queue is often the best data structure, which is already available through the heapq module. Regards, Martin

Vladimir 'Yu' Stepanov

2:31 a.m.

New subject: binary trees.

Martin v. Löwis wrote:

...

Vladimir 'Yu' Stepanov wrote:

...
Yes. I understood it when resulted a set example.

...
However, as I just said, people usually don't remove items from just-sorted lists, they tend to iterate over them via 'for i in list:' .

Such problem arises at creation of the list of timers.

For a list of timers/timeout events, a priority queue is often the best data structure, which is already available through the heapq module.

It is in most cases valid so. heapq very much approaches for these purposes. The unique problem is connected with removal of an any element from the list. -- Vladimir Stepanov

Josiah Carlson

2:58 a.m.

New subject: binary trees.

"Vladimir 'Yu' Stepanov" <vys@renet.ru> wrote:

...

Martin v. Löwis wrote:

...
Vladimir 'Yu' Stepanov wrote:

...
Yes. I understood it when resulted a set example.

...
However, as I just said, people usually don't remove items from just-sorted lists, they tend to iterate over them via 'for i in list:' .

Such problem arises at creation of the list of timers.

For a list of timers/timeout events, a priority queue is often the best data structure, which is already available through the heapq module.

It is in most cases valid so. heapq very much approaches for these purposes. The unique problem is connected with removal of an any element from the list.

For this particular problem, usually you don't want to say, "remove time 4.34252 from the heap", you want to say "remove the (time,event) pair given the event X, from the heap", where you have in your heap (time, event) pairs. For hashable events, you can create a class which implements enough list-like semantics to allow it to work like a heap for event timing (__getitem__, __setitem__, __len__, .pop(-1) is about it, if I remember correctly), but also allows you to remove arbitrary events (as items are shifted around in the heap, you update pointers from a dictionary). The details are a bit annoying to get right, but it is possible. - Josiah

Aahz

April 2006

6:03 p.m.

On Thu, Apr 06, 2006, Sanghyeon Seo wrote:

...

base64 module documentation for b64decode function says, "TypeError is raised if s were incorrectly padded or if there are non-alphabet characters present in the string." But this doesn't seem to be the case. Testcase:

import base64 base64.b64decode('%')

Since % is a non-alphabet character, this should raise TypeError (btw, shouldn't this be ValueError instead?), but Python 2.4.3 silently ignores.

Please submit a bug report on SourceForge and report back the ID. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "Look, it's your affair if you want to play with five people, but don't go calling it doubles." --John Cleese anticipates Usenet

Sanghyeon Seo

6:21 p.m.

2006/4/7, Aahz <aahz@pythoncraft.com>:

...

Please submit a bug report on SourceForge and report back the ID.

http://python.org/sf/1466065 Seo Sanghyeon

6830

Age (days ago)

6872

Last active (days ago)

List overview

Download

33 comments

13 participants

participants (13)

"Martin v. Löwis"
Aahz
Dino Viehland
Edward Loper
Guido van Rossum
Hye-Shik Chang
Jason Orendorff
Josiah Carlson
Sanghyeon Seo
Terry Reedy
Tim Peters
Vladimir 'Yu' Stepanov
Vladimir Yu. Stepanov

base64 module

Vladimir 'Yu' Stepanov

Josiah Carlson

Vladimir 'Yu' Stepanov

Vladimir 'Yu' Stepanov

Josiah Carlson

Vladimir 'Yu' Stepanov

Vladimir 'Yu' Stepanov

Josiah Carlson

Vladimir 'Yu' Stepanov

Josiah Carlson

Vladimir 'Yu' Stepanov

Josiah Carlson

Vladimir Yu. Stepanov

Vladimir 'Yu' Stepanov

Vladimir 'Yu' Stepanov

Jason Orendorff

Vladimir 'Yu' Stepanov

Jason Orendorff

Vladimir 'Yu' Stepanov

Vladimir 'Yu' Stepanov

Josiah Carlson

tags

participants (13)