[Python-Dev] Python3 regret about deleting list.sort(cmp=...)

Glenn Linderman v+python at g.nevcal.com
Sun Mar 13 04:52:24 CET 2011


On 3/12/2011 7:21 PM, Terry Reedy wrote:
> (Ok, I assumed that the 'word' field does not include any of 
> !"#$%&'()*+. If that is not true, replace comma with space or even a 
> control char such as '\a' which even precedes \t and \n.)

OK, I agree the above was your worst assumption, although you need to 
add "," to your list also, because that allows for the data puns.

You also rewrote Guido's text from "shortstring" to "word" and assumed 
it had certain content semantics, but since only integer is after the 
",", rsplit would work to separate the fields even if shortstring 
contains ",".

And the choice of delimiter really determines whether data puns can 
exist.  If and only if you know that there is a character that is lower 
in sort order than any of the characters in the sort strings, can you 
"cheat" and put a variable length string into a sort key field, by 
terminating it with such a character.  The safest such character is \0, 
unless you are coding in C, then \a as you now suggest, but only if you 
can be 100% sure it is not found in the data.  If you cannot guarantee 
the data doesn't contain them, there will be the possibility of data 
puns among variable length strings, and the algorithms will sort wrong 
in pathological cases.

I wouldn't have called you on this, except that it really is important 
not to give people the idea that you can blithely use a variable length 
string anywhere except at the tail of a multi-field sort string.  In 
general, you can't.  I've long since lost track of the number of times 
I've helped people understand the fix to programs that tried that.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110312/57631b52/attachment.html>


More information about the Python-Dev mailing list