python 3's adoption
Alf P. Steinbach
alfps at start.no
Fri Jan 29 01:10:01 EST 2010
* Steve Holden:
>>
> While I am fully aware that premature optimization, etc., but I cannot
> resist an appeal to efficiency if it finally kills off this idea that
> "they took 'cmp()' away" is a bad thing.
>
> Passing a cmp= argument to sort provides the interpreter with a function
> that will be called each time any pair of items have to be compared. The
> key= argument, however, specifies a transformation from [x0, x1, ...,
> xN] to [(key(x0), x0), (key(x1), x1), ..., (key(xN), xN)] (which calls
> the key function precisely once per sortable item).
>
>>From a C routine like sort() [in CPython, anyway] calling out from C to
> a Python function to make a low-level decision like "is A less than B?"
> turns out to be disastrous for execution efficiency (unlike the built-in
> default comparison, which can be called directly from C in CPython).
>
> If your data structures have a few hundred items in them it isn't going
> to make a huge difference. If they have a few million thenit is already
> starting to affect performance ;-)
It's not either/or, it's do programmers still need the cmp functionality?
Consider that *correctness* is a bit more important than efficiency, and that
sorting strings is quite common...
Possibly you can show me the way forward towards sorting these strings (shown
below) correctly for a Norwegian locale. Possibly you can't. But one thing is
for sure, if there was a cmp functionality it would not be a problem.
<example>
Python 3.1.1 (r311:74483, Aug 17 2009, 17:02:12) [MSC v.1500 32 bit (Intel)]
on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> L = ["æ", "ø", "å"] # This is in SORTED ORDER in Norwegian
>>> L
['æ', 'ø', 'å']
>>> L.sort()
>>> L
['å', 'æ', 'ø']
>>>
>>> import locale
>>> locale.getdefaultlocale()
('nb_NO', 'cp1252')
>>> locale.setlocale( locale.LC_ALL ) # Just checking...
'C'
>>> locale.setlocale( locale.LC_ALL, "" ) # Setting default locale, Norwgian.
'Norwegian (Bokmål)_Norway.1252'
>>> locale.strxfrm( "æøå" )
'æøå'
>>> L.sort( key = locale.strxfrm )
>>> L
['å', 'æ', 'ø']
>>> locale.strcoll( "å", "æ" )
1
>>> locale.strcoll( "æ", "ø" )
-1
>>>
</example>
Note that strcoll correctly orders the strings as ["æ", "ø", "å"], that is, it
would have if it could have been used as cmp function to sort (or better, to a
separate routine named e.g. custom_sort).
And strcoll can be so used in 2.x:
<example>
C:\Documents and Settings\Alf\test> py2
Python 2.6.4 (r264:75708, Oct 26 2009, 08:23:19) [MSC v.1500 32 bit (Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.
>>> def show( list ):
... print "[" + ", ".join( list ) + "]"
...
>>> L = [u"æ", u"ø", u"å"]
>>> show( L )
[æ, ø, å]
>>> L.sort()
>>> show( L )
[å, æ, ø]
>>> import locale
>>> locale.setlocale( locale.LC_ALL, "" )
'Norwegian (Bokm\xe5l)_Norway.1252'
>>> L.sort( cmp = locale.strcoll )
>>> show( L )
[æ, ø, å]
>>> L
[u'\xe6', u'\xf8', u'\xe5']
>>> _
</example>
The above may just be a bug in the 3.x stxfrm. But it illustrates that sometimes
you have your sort order defined by a comparision function. Transforming that
into a key can be practically impossible (it can also be quite inefficient).
Cheers & hth.,
- Alf
More information about the Python-list
mailing list