python 3's adoption
Alf P. Steinbach
alfps at start.no
Fri Jan 29 10:35:50 EST 2010
* Steven D'Aprano:
> On Fri, 29 Jan 2010 07:10:01 +0100, Alf P. Steinbach wrote:
>
>> >>> L = ["æ", "ø", "å"] # This is in SORTED ORDER in Norwegian L
>
> [...]
>
>> >>> L.sort( key = locale.strxfrm )
>> >>> L
>> ['å', 'æ', 'ø']
>> >>> locale.strcoll( "å", "æ" )
>> 1
>> >>> locale.strcoll( "æ", "ø" )
>> -1
>>
>> Note that strcoll correctly orders the strings as ["æ", "ø", "å"], that
>> is, it would have if it could have been used as cmp function to sort (or
>> better, to a separate routine named e.g. custom_sort).
>
> This is in Python2.5, so I'm explicitly specifying unicode strings:
>
>>>> L = [u"æ", u"ø", u"å"]
>>>> assert sorted(L) == [u'å', u'æ', u'ø']
>
> The default C-locale sorting of L does not equal to L. Now let's change
> to Norwegian locale:
>
>>>> import locale
>>>> locale.setlocale(locale.LC_ALL, 'nb_NO')
> 'nb_NO'
>>>> print u''.join(sorted(L, cmp=locale.strcoll))
> æøå
>
> So far so good, we've confirmed that in Python 2.5 we can sort in a
> locale-aware form. Now, can we do this with a key function? Thanks to
> Raymond Hettinger's recipe here:
>
> http://code.activestate.com/recipes/576653/
>
>
>>>> print u''.join(sorted(L, key=CmpToKey(locale.strcoll)))
> æøå
>
>
> Success!
>
> Let's try it in Python 3.1:
>
>>>> L = ["æ", "ø", "å"]
>>>> assert sorted(L) == ['å', 'æ', 'ø']
>>>>
>>>> import locale
>>>> locale.setlocale(locale.LC_ALL, 'nb_NO')
> 'nb_NO'
Hm. A bit off-topic, but...
>>> import locale
>>> locale.setlocale(locale.LC_ALL, 'nb_NO')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Program Files\cpython\python31\lib\locale.py", line 527, in
return _setlocale(category, locale)
locale.Error: unsupported locale setting
>>> _
This on a machine where the Python default locale is Norwegian.
>>>> ''.join(sorted(L, key=CmpToKey(locale.strcoll)))
> 'æøå'
>
>
> The definition of CmpToKey can be found at the URL above. It's not very
> exciting, but here it is:
>
>
> def CmpToKey(mycmp):
> 'Convert a cmp= function into a key= function'
> class K(object):
> def __init__(self, obj, *args):
> self.obj = obj
> def __lt__(self, other):
> return mycmp(self.obj, other.obj) == -1
> def __gt__(self, other):
> return mycmp(self.obj, other.obj) == 1
> def __eq__(self, other):
> return mycmp(self.obj, other.obj) == 0
> def __le__(self, other):
> return mycmp(self.obj, other.obj) != 1
> def __ge__(self, other):
> return mycmp(self.obj, other.obj) != -1
> def __ne__(self, other):
> return mycmp(self.obj, other.obj) != 0
> return K
This is pretty smart as a generic solution.
Thanks! :-)
I was thinking more of sticking those comparisions in some custom string class
or such, which would be rather ugly...
The above does have quite high overhead though!
That is, it's /inefficient/ compared to using a 'cmp' function directly.
> If that's too verbose for you, stick this as a helper function in your
> application:
>
>
> def CmpToKey(mycmp):
> 'Convert a cmp= function into a key= function'
> class K(object):
> def __init__(self, obj, *args):
> self.obj = obj
> __lt__ = lambda s, o: mycmp(s.obj, o.obj) == -1
> __gt__ = lambda s, o: mycmp(s.obj, o.obj) == 1
> __eq__ = lambda s, o: mycmp(s.obj, o.obj) == 0
> __le__ = lambda s, o: mycmp(s.obj, o.obj) != 1
> __ge__ = lambda s, o: mycmp(s.obj, o.obj) != -1
> __ne__ = lambda s, o: mycmp(s.obj, o.obj) != 0
> return K
>
>
> [...]
>> The above may just be a bug in the 3.x stxfrm. But it illustrates that
>> sometimes you have your sort order defined by a comparision function.
>> Transforming that into a key can be practically impossible (it can also
>> be quite inefficient).
>
> This might be true, but you haven't demonstrated it.
The "can be ... practically impossible" was hogwash. I posted late, sorry. The
"quite inefficient" holds.
> With one little
> helper function, you should be able to convert any comparison function
> into a key function, with relatively little overhead.
I wouldn't call an extra Python method call per comparision "relatively little
overhead".
Note that the same number of comparisions as with a 'cmp' based sort is performed.
But with the wrapper every comparision is indirected through a Python method
call (I think, but not sure, that the creation of those comparision objects does
not introduce significant overhead, but the calls must).
<code>
# -*- coding: utf-8 -*-
from __future__ import print_function
from __future__ import unicode_literals
try:
range = xrange
except:
pass
import locale
import sys
import random
import timeit
def CmpToKey( mycmp ):
"Convert a cmp= function into a key= function"
class K( object ):
def __init__( self, obj, *args ):
self.obj = obj
def __lt__( self, other ):
return mycmp( self.obj, other.obj ) == -1
def __gt__( self, other ):
return mycmp( self.obj, other.obj ) == 1
def __eq__( self, other ):
return mycmp( self.obj, other.obj ) == 0
def __le__( self, other ):
return mycmp( self.obj, other.obj ) != 1
def __ge__( self, other ):
return mycmp( self.obj, other.obj ) != -1
def __ne__( self, other ):
return mycmp( self.obj, other.obj ) != 0
return K
def elapsed_time_for( f ):
n_calls = 1
return timeit.timeit( f, number = n_calls )
def major_pyversion():
return sys.version_info[0]
def sorting_of( string_list ):
def sort2x():
string_list.sort( cmp = locale.strcoll )
def sort3x():
string_list.sort( key = CmpToKey( locale.strcoll ) )
return sort2x if major_pyversion() <= 2 else sort3x
alphabet = "abcdefghijklmnopqrstuvwxyzæøå"
n_strings = int( 1e5 )
nbno_locale = "" # "nb_NO" raises exception on my machine.
def rnd_string():
len = random.randint( 1, 80 )
s = ""
for i in range( len ):
s = s + random.choice( alphabet )
return s
locale.setlocale( locale.LC_ALL, nbno_locale )
random.seed( 666 )
assert( rnd_string() == "æmoxqudxlszåaåcteærmuocøjrdæpbvyabwhn" )
print( "Creating array of pseudo-random strings..." )
L = [rnd_string() for i in range( n_strings )]
print( "Sorting..." )
time = elapsed_time_for( sorting_of( L ) )
print( "Last string is '{0}'".format( L[-1] ) )
print( "{0}".format( time ) )
assert( L[-1][0] == "å" )
</code>
<example>
C:\Documents and Settings\Alf\test> py3 sort.py
Creating array of pseudo-random strings...
Sorting...
Last string is
'åååywxuybrxgvnstoyvcmdåjdlnxwwfsbnzsinxncmtrxuxgtqtduzvåixqovtjxiflfesuvfa'
8.45987772189
C:\Documents and Settings\Alf\test> py2 sort.py
Creating array of pseudo-random strings...
Sorting...
Last string is
'åååywxuybrxgvnstoyvcmdåjdlnxwwfsbnzsinxncmtrxuxgtqtduzvåixqovtjxiflfesuvfa'
3.41336790011
C:\Documents and Settings\Alf\test> _
</example>
In the run above Python 3.x was only roughly 2.5 times slower. On my machine it
ranges from 2.5 times slower to 3 times slower. Most often around 2.5 x slower.
Of course this does not measure how much faster a Python 3.x cmp/"<" based sort
/could be/, only how slow, sluggish and snail-like it is compared to 2.x. :-)
In other words, that's the speed price, 2.5 to 3 times slower sorting, for
ditching cmp instead of e.g. replacing it with something better.
Cheers & hth.,
- Alf
More information about the Python-list
mailing list