Hi, Guido, and people. It recurrently happens that newcomers on the Python mailing list are surprised that list.sort() does not return the sorted list as value. I quite understand and agree that this is a good thing, because sorting is done in place, and Python programmers should stay aware and alert of this fact. Yet, I often see myself writing things like: keys = messages.keys() keys.sort() for key in keys: DO_SOMETHING This is not difficult to write, only slightly annoying. Writing: def sorted(list): list = list[:] list.sort() return list with the goal of simplifying the first excerpt into: for key in sorted(message.keys()): DO_SOMETHING it is not really worth for small programs. But in larger programs, where one often loops over the sorted element of a list, it might become reasonable to write this extra definition. My feeling is that the idiom is common enough to be worth a list method, so the above could be written instead: for key in message.keys().sorted(): DO_SOMETHING I immediately see an advantage and an inconvenient. The inconvenient is that users might confuse `.sort()' with `.sorted()', however we decide to spell `sorted', so the existence of both may be some kind of trap. The advantage is that the `.sorted()' method fits well within how Python has evolved recently, offering more concise and legible writings for frequent idioms. Tim invested a lot of courageous efforts so Python `sort' becomes speedier. A `.sorted()' method requires separate space to hold the result, using the same size as the original, and that guaranteed extra-space may eventually be put to good use for speeding up the sorting even more. The constraint of a sort being in-place has indeed a cost, and deep down, we agree that this constraint is artificial in contexts where `.sorted()' is really what the user needs. -- François Pinard http://www.iro.umontreal.ca/~pinard
pinard@iro.umontreal.ca (=?iso-8859-1?q?Fran=E7ois?= Pinard):
The advantage is that the `.sorted()' method fits well within how Python has evolved recently, offering more concise and legible writings for frequent idioms.
I prefer the idea of making sorted() a separate function, because it can then be made to work on any sequence that can be copied and has a sort() method. To support specialised non-in-place sorting algorithms, it could check whether its argument has a sorted() method, and if not, fall back on the general implementation. This seems more Pythonic to me. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
Since François is probably waiting for a pronouncement for me, let me say that I think this is a problem that should not be addressed by changes to the language, builtins or library. A sorted() method for lists would require a copy. François argues that the extra space could be used by the sorting algorithm. But if the requirement is that the original array must not be shuffled at all, I expect that there's no way you can make use of the extra space: you have to make a copy of the whole list first, which then gets shuffled in various ways. I suppose it would be possible to write a sorting algorithm that made some use of the availability of an output array, but rewriting the sort code once again so that you can avoid writing a three line function doesn't seem a good trade-off. More generalized solutions seem overkill: I've not seen demand for sorting other container types (except for list subclasses). The argument against making sort() return self (while sorting in-place) still holds, and this argument also means that having a sorted() that sorts in-place is a bad idea. You could consider adding a "sort" option to keys(), values() and items(), but that doesn't solve other similar cases. I think you'll just have to live with it. Or you can create a dict subclass that sorts its keys. --Guido van Rossum (home page: http://www.python.org/~guido/)
[Guido]
... A sorted() method for lists would require a copy. François argues that the extra space could be used by the sorting algorithm. But if the requirement is that the original array must not be shuffled at all, I expect that there's no way you can make use of the extra space: you have to make a copy of the whole list first, which then gets shuffled in various ways.
I suppose it would be possible to write a sorting algorithm that made some use of the availability of an output array, but rewriting the sort code once again so that you can avoid writing a three line function doesn't seem a good trade-off.
There's no efficiency argument to be made here unless someone can write a sort function this way and demonstrate an improvement. I expect that would be hard. Back when I wrote the samplesort hybrid, I tried several ways of coding mergesorts too, and they all lost on random data. They all used a temp array of the same size as the original array. The current mergesort does not: it uses a temp array at most half the size. This effectively doubled the amount of code needed, but cut the size of the working set. I first tried the current mergesort again with a temp array the same size as the original, but it again lost (a little on random data, a lot on many kinds of partially ordered data -- for example, take a sorted array, and move its last element to the front; no matter how large the array, the current mergesort only needs a few dozen temp words to get it sorted again, and caches are much happier with that).
participants (4)
-
Greg Ewing
-
Guido van Rossum
-
pinard@iro.umontreal.ca
-
Tim Peters