RE: [Python-Dev] inline sort option

#- > > like type checkers (and IMO are also easily overseen by human #- > > readers). And, it's easier to write l.sorted() rather than #- > > l.sort(inline=True). #- #- [Aahz] #- > Let's make explicit: l.copysort() #- > #- > I'm not a big fan of grammatical suffixes for #- distinguishing between #- > similar meanings. #- #- +1 +2, considering that the difference in behaviour with sort and sorted it's no so clear to a non-english speaker. (my first post to the development list, :D ) . Facundo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ADVERTENCIA La información contenida en este mensaje y cualquier archivo anexo al mismo, son para uso exclusivo del destinatario y pueden contener información confidencial o propietaria, cuya divulgación es sancionada por la ley. Si Ud. No es uno de los destinatarios consignados o la persona responsable de hacer llegar este mensaje a los destinatarios consignados, no está autorizado a divulgar, copiar, distribuir o retener información (o parte de ella) contenida en este mensaje. Por favor notifíquenos respondiendo al remitente, borre el mensaje original y borre las copias (impresas o grabadas en cualquier medio magnético) que pueda haber realizado del mismo. Todas las opiniones contenidas en este mail son propias del autor del mensaje y no necesariamente coinciden con las de Telefónica Comunicaciones Personales S.A. o alguna empresa asociada. Los mensajes electrónicos pueden ser alterados, motivo por el cual Telefónica Comunicaciones Personales S.A. no aceptará ninguna obligación cualquiera sea el resultante de este mensaje. Muchas Gracias.

[GvR]
[Aahz]
[Facundo]
+2, considering that the difference in behaviour with sort and sorted it's no so clear to a non-english speaker.
FWIW, I've posted a patch to implement list.copysort() that includes a news announcement, docs, and unittests: www.python.org/sf/825814 Raymond Hettinger

Despite my suggesting a better name, I'm not in favor of this (let's say -0). For one, this will surely make lots of people write for key in D.keys().copysort(): ... which makes an unnecessary copy of the keys. I'd rather continue to write keys = D.keys() keys.sort() for key in keys: ... --Guido van Rossum (home page: http://www.python.org/~guido/)

On Fri, Oct 17, 2003, Guido van Rossum wrote:
I'm actually -1, particularly with your clear argument; I just didn't like your suggestion of l.sorted(). -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "It is easier to optimize correct code than to correct optimized code." --Bill Harlan

On Sat, 2003-10-18 at 07:07, Brett C. wrote:
I'm -1 as well. Lists do not need to grow a method for something that only replaces two lines of code that are not tricky in any form of the word.
And don't forget that the trivial function will sort any iterable, not just lists. I think for member in copysort(someset): is better than for member in list(someset).copysort(): I'm against list.copysort(), and for either leaving things unchanged or adding copysort() as a builtin (especially if it can use the reference count trick to avoid unnecessary copies). Mark Russell

On Saturday 18 October 2003 11:44 am, Mark Russell wrote:
The trick would need to check that the argument is a list, of course, as well as checking that the reference to it on the stack is the only one around. But given this, yes, I guess a built-in would be "better" by occasionally saving the need to type a few extra characters (though maybe "worse" by enlarging the built-in module rather than remaining inside the smaller namespace of the list type...?). The built-in, or method, 'copysort', would have to accept the same optional arguments as the sort method of lists has just grown, of course. Alex

Wondering about the trick of copysort not copying a singly-referenced list I decided to try it out in a tiny extension module, and, yes, it is just as trivial as one might wish (haven't dealt with optional args to sort, just wanting to check performance &c): static PyObject* copysort(PyObject* self, PyObject* args) { PyObject *sequence, *listresult; if(!PyArg_ParseTuple(args, "O", &sequence)) return 0; if(PyList_CheckExact(sequence) && sequence->ob_refcnt==1) { listresult = sequence; Py_INCREF(listresult); } else { listresult = PySequence_List(sequence); } if(listresult) { if(PyList_Sort(listresult) == -1) { Py_DECREF(listresult); listresult = 0; } } return listresult; } and performance on an equally trivial testcase: x = dict.fromkeys(range(99999)) def looponsorted1(x): keys = x.keys() keys.sort() for k in keys: pass def looponsorted2(x, c=copysort.copysort): for k in c(x.keys()): pass turns out to be identical between the two _with_ The Trick (4.4e+04 usec with timeit.py -c on my box) while without The Trick copysort would slow down to about 5.5e+04 usec. But, this reminds me -- function filter, in bltinmodule.c, uses just about the same trick too (to modify in-place when possible rather than making a new list -- even though when it does make a new list it's an empty one, not a copy, so the gain is less). There must be other cases of applicability which just haven't been considered. So... Shouldn't The Trick be embodied in PySequence_List itself...? So, the whole small tricky part above: if(PyList_CheckExact(sequence) && sequence->ob_refcnt==1) { listresult = sequence; Py_INCREF(listresult); } else { listresult = PySequence_List(sequence); } would collapse to a single PySequence_List call -- *AND* potential calls from Python code such as "x=list(somedict.keys())" might also be speeded up analogously... [Such a call looks silly when shown like this, but in some cases one might not know, in polymorphic use, whether a method returns a new or potentially shared list, or other sequence, and a call to list() on the method's result then may be needed to ensure the right semantics in all cases]. Is there any hidden trap in The Trick that makes it unadvisable to insert it in PySequence_List? Can't think of any, but I'm sure y'all will let me know ASAP what if anything I have overlooked...;-). One might even be tempted to reach down all the way to PyList_GetSlice, placing THERE The Trick in cases of all-list slicing of a singly-referenced list (PyList_GetSlice is what PySequence_List uses, so it would also get the benefit), but that might be overdoing it -- and encouraging list(xxx) instead of xxx[:], by making the former a bit faster in some cases, would be no bad thing IMHO (already I'm teaching newbies to prefer using list(...) rather than ...[:] strictly for legibility and clarity, being able to mention possible future performance benefits might well reinforce the habit...;-). Alex

On Saturday 18 October 2003 02:26 pm, Alex Martelli wrote: ...oops...
x = dict.fromkeys(range(99999))
here, x.keys() IS already sorted, so the importance of The Trick is emphasized because the sort itself has little work to do:
I've changed the initialization of x to
x = dict.fromkeys(map(str,range(99999)))
so that x.keys() is not already sorted (still has several runs that the sort will exploit -- perhaps representative of some real-world sorts...;-) and the numbers change to about 240 milliseconds with The Trick (or with separate statements to get and sort the keys), 265 without -- so, more like 10% advantage, NOT 20%-ish (a list.copysort method, from Raymond's patch, has 240 milliseconds too -- indeed it's just about the same code I was using in the standalone function I posted, give or take some level of indirectness in C calls that clearly don't matter much here). Of course, the % advantage will vary with the nature of the list (how many runs that sort can exploit) and be bigger for smaller lists (given we're comparing O(N) copy efforts vs O(N log N) sorting efforts). Alex

I don't like the trick of avoiding the copy if the refcount is one; AFAIK it can't be done in Jython. I think the application area is too narrow to warrant a built-in, *and* lists shouldn't grow two similar methods. Let's keep the language small! (I know, by that argument several built-ins shouldn't exist. Well, they might be withdrawn in 3.0; let's not add more.) --Guido van Rossum (home page: http://www.python.org/~guido/)

I don't like the trick of avoiding the copy if the refcount is one; AFAIK it can't be done in Jython.
As Alex demonstrated, the time savings for an O(n) operation inside an O(n log n) function is irrelevant anyway.
Not to be hard headed here, but if dropped now, it will never be considered again. Did you have a chance to look at the rationale for change in my previous note and added in the comments for the patch? I think they offer some examples and reasons stronger than "saving a little typing": www.python.org/sf/825814 Raymond

I'm taking that into account. It still smells funny. --Guido van Rossum (home page: http://www.python.org/~guido/)

On Sat, 18 Oct 2003, Guido van Rossum wrote:
I don't like the trick of avoiding the copy if the refcount is one; AFAIK it can't be done in Jython.
There is also a problem with the strategy if if gets called by a C extension. It is perfectly feasible for a C extension to hold the only reference to an object, call the copying sort (directly or indirectly), and then be very surprised that the copy did not take place. -Kevin -- -- Kevin Jacobs The OPAL Group - Enterprise Systems Architect Voice: (440) 871-6725 x 19 E-mail: jacobs@theopalgroup.com Fax: (440) 871-6722 WWW: http://www.theopalgroup.com/

On Saturday 18 October 2003 08:00 pm, Kevin Jacobs wrote:
Alas, I fear you're right. Darn -- so much for a possible little but cheap optimization (which might have been neat in PySequence_List even if copysort never happens and the optimization is only for CPython -- I don't see why an optimization being impossible in Jython should stop CPython from making it, as long as semantics remain compatible). It's certainly possible for C code to call PySequence_List or whatever while holding the only reference, and count on the returned and argument objects being distinct:-(. Alex

On Saturday 18 October 2003 06:33 pm, Guido van Rossum wrote:
I don't like the trick of avoiding the copy if the refcount is one; AFAIK it can't be done in Jython.
No, but if it's only a small optimization, who cares? Anyway, the objection that these functions might be called by _C_ code who's holding the only reference to a PyObject* probably kills The Trick (particularly my hope of moving it into PySequence_List whether copysort survived or not).
Aye aye, captain. Can we dream of a standard library module of "neat hacks that don't really warrant a built-in" in which to stash some of these general-purpose, no-specific-appropriate-module, useful functions and classes? Pluses: would save some people reimplementing them over and over and sometimes incorrectly; would remove any pressure to add not-perfectly-appropriate builtins. Minuses: one more library module (the, what, 211th? doesn't seem like a biggie). Language unchanged -- just library. Pretty please?
(I know, by that argument several built-ins shouldn't exist. Well, they might be withdrawn in 3.0; let's not add more.)
"Amen and Hallelujah" to the hope of slimming language and built-ins in 3.0 (presumably the removed built-ins will go into a "legacy curiosa" module, allowing a "from legacy import *" to ease making old code run in 3.0? seems cheap & sensible). Alex

At 10:11 PM 10/18/03 +0200, Alex Martelli wrote:
Hmmm. import tricky.hacks from dont_try_this_at_home_kids import * I suppose 'shortcuts' would probably be a less contentious name. :) The downside to having such a module would be that it would entertain ongoing pressure to add more things to it. I suppose it'd be better to have a huge shortcuts module (or maybe shortcuts package, divided by subject matter) than to keep adding builtins.
I like it. Or, for symmetry, maybe 'from __past__ import lambda'. ;-) Say, in 3.0, will there be perhaps *no* builtins? After all, you don't need builtins to import things. Nah, that'd be too much like Java, and not enough like pseudocode. Ah well, time for me to stop making suggestions on what color to paint the bicycle shed, and start doing some real work today. :)

Modules should be about specific applications, or algorithms, or data types, or some other unifying principle. I think "handy" doesn't qualify. :-)
Let's not speculate yet about how to get old code to run in 3.0. --Guido van Rossum (home page: http://www.python.org/~guido/)

On Sat, 2003-10-18 at 12:31, Alex Martelli wrote:
The built-in, or method, 'copysort', would have to accept the same optional arguments as the sort method of lists has just grown, of course.
Yes. In fact one point it its favour is that there aren't any choices to be made - the interface should track that of list.sort(), so it avoids the usual objection to trivial functions that there are many possible variants. Mark Russell

[Raymond]
[Guido]
Interesting that you saw this at the same time I was fretting about it over dinner. The solution is to bypass the copy step for the common case of: for elem in somelistmaker().copysort(): . . . The revised patch is at: www.python.org/sf/825814 The technique is to re-use the existing list whenever the refcount is one. This keeps the mutation invisible. Advantages of a copysort() method: * Avoids creating an unnecessary, stateful variable that remains visible after the sort is needed. In the above example, the definition of the "keys" variable changes from unsorted to sorted. Also, the lifetime of the variable extends past the loop where it was intended to be used. In longer code fragments, this unnecessarily increases code complexity, code length, the number of variables, and increases the risk of using a variable in the wrong state which is a common source of programming errors. * By avoiding control flow (the assignments in the current approach), an inline sort becomes usable anywhere an expression is allowed. This includes important places like function call arguments and list comprehensions: todo = [t for t in tasks.copysort() if due_today(t)] genhistory(date, events.copysort(key=incidenttime)) Spreading these out over multiple lines is an unnecessary distractor from the problem domain, resulting is code that is harder to read, write, visually verify, grok, or debug. Raymond Hettinger P.S. There are probably better names than copysort, but the idea still holds.

Guido> For one, this will surely make lots of people write Guido> for key in D.keys().copysort(): Guido> ... Guido> which makes an unnecessary copy of the keys. It might be viewed as unnecessary if you intend to change D's keys within the loop. Guido> keys = D.keys() Guido> keys.sort() Guido> for key in keys: Guido> ... Current standard practice is also fine. Skip

On Saturday 18 October 2003 05:16 pm, Skip Montanaro wrote:
D.keys() makes a _snapshot_ of the keys of D -- it doesn't matter what you do to D in the loop's body. Admittedly, that's anything but immediately obvious (quite apart from copysorting or whatever) -- I've seen people change perfectly good code of the form: for k in D.keys(): vastly_alter_a_dictionary(D, k) into broken code of the form: for k in D: vastly_alter_a_dictionary(D, k) because of having missed this crucial difference -- snapshot in the first case, but NOT in the second one. And viceversa, I've seen people carefully copy.copy(D.keys()) or the equivalent to make sure they did not suffer from modifying D in the loop's body -- the latter is in a sense even worse, because the bad effects of the former tend to show up pretty fast as weird bugs and broken unit-tests, while the latter is "just" temporarily wasting some memory and copying time. Anyway, copysort with The Trick, either as a method or function, has no performance problems - exactly the same performance as:
Nolo contendere. It DOES feel a bit like boilerplate, that's all. Alex

Alex Martelli strung bits together to say:
Hi, While I'm not an active Python contributor (yet), I've been lurking on python-dev since March. Something was bugging me about the whole l.copysort() ('sortedcopy'?) idea. For whatever reason, the above comment crystalised it - if there's going to be a special 'sortedcopy' to allow minimalist chaining, then what about 'reversedcopy' or 'sortedreversedcopy', or any of the other list methods that may be considered worth chaining? Particularly since the following trick seems to work: ==============
(Tested with Python 2.3rc2, which is what is currently installed on my home machine) Not exactly the easiest to read, but it does do the job of "sorted copy as an expression", as well as letting you chain arbitrary methods of any object. Regards, Nick. -- Nick Coghlan | Brisbane, Australia ICQ#: 68854767 | ncoghlan@email.com Mobile: 0409 573 268 | http://www.talkinboutstuff.net "Let go your prejudices, lest they limit your thoughts and actions."

On Sun, Oct 19, 2003 at 03:18:07AM +1000, Nick Coghlan wrote:
[...]
[...]
(I'm new on this list) Actually, there is way to do this out-of-the-box without the chain() function:
And there is also one for copysort():
But that's probably not more readable. Architect's Sketch... -- Dan Aloni da-x@gmx.net

On Saturday 18 October 2003 09:13 pm, Dan Aloni wrote: ...
This cannot be applied to D.keys() for some directory D.
(lambda x:(x, x.sort())[0])(list(a))
This one can, because the lambda lets you give a temporary name x to the otherwise-unnamed list returned by D.keys(). It can be made a _little_ better, too, I think:
and if you want it reversed after sorting,
(lambda x: x.sort() or x.reverse() or x)(D.keys()) ['o', 'i', 'c', 'a']
But that's probably not more readable.
You have a gift for understatement. Still, probably more readable than the classic list comprehension hack:
[x for x in [D.keys()] for y in [x.sort(), x] if y][0] ['a', 'c', 'i', 'o']
also, the lambda hack doesn't leak names into the surrounding scope, while the list comprehension hack, alas, does. BTW, welcome to python-dev! Alex

On Sat, Oct 18, 2003 at 10:01:52PM +0200, Alex Martelli wrote:
Good, so this way the difference between copied and not copied is minimized:
(lambda x: x.sort() or x)(a)
And:
(lambda x: x.sort() or x)(list(a))
Nice, this lambda hack is a cleaner, more specific, and simple deviation of the chain() function. Perhaps it could be made more understandable like:
And:
sorted(a) ['a', 'c', 'i', 'o']
The only problem is that you assume .sort() always returns a non True value. If some time in the future .sort() would return self, your code would break and then the rightful usage would be:
And:
I didn't see the begining of this discussion, but it looks to me that sort() returning self is much better than adding a .copysort().
BTW, welcome to python-dev!
Thanks! -- Dan Aloni da-x@gmx.net

On Saturday 18 October 2003 10:54 pm, Dan Aloni wrote: ...
No fair -- that's not a single expression any more!-)
Why do you think it would break? It would do a _tiny_ amount of avoidable work, but still return the perfectly correct result. Sure you don't think I'd post an unreadable inline hack that would break in the unlikely case the BDFL ever made a change he's specifically Pronounced again, right?-)
I didn't see the begining of this discussion, but it looks to me that sort() returning self is much better than adding a .copysort().
The BDFL has Pronounced against it: he doesn't LIKE chaining. Alex

On Saturday 18 October 2003 07:18 pm, Nick Coghlan wrote:
Hi Nick!
sort has just (in CVS right now) sprouted an optional reverse=True parameter that makes sort-reverse chaining a non-issue (thanks to the usual indefatigable Raymond, too). But, for the general case: the BDFL has recently Pronounced that he does not LIKE chaining and doesn't want to encourage it in the least. Yes, your trick does allow chaining, but the repeated chain(...) calls are cumbersome enough to not count as an encouragement IMHO;-). A generalized wrapping system that wraps all methods so that any "return None" is transformed into a "return self" WOULD constitute encouragement, and thus a case of Lese BDFLity, which would easily risk the wrath of the PSU (of COURSE there ain't no such thing!)... Alex

Alex Martelli strung bits together to say:
Well, yes, that was sort of the point. For those who _really_ like chaining (I'm not one of them - I agree with Guido that it is less readable and harder to maintain), the 'chain' function provides a way to do it with what's already in the language. Cheers, Nick. -- Nick Coghlan | Brisbane, Australia ICQ#: 68854767 | ncoghlan@email.com Mobile: 0409 573 268 | http://www.talkinboutstuff.net "Let go your prejudices, lest they limit your thoughts and actions."

[Alex Martelli]
[Nick Coghlan]
Remember, list.copysort() isn't about chaining or even "saving a line or two". It is about using an expression instead of a series of statements. That makes it possible to use it wherever expressions are allowed, including function call arguments and list comprehensions. Here are some examples taken from the patch comments: genhistory(date, events.copysort(key=incidenttime)) todo = [t for t in tasks.copysort() if due_today(t)] To break these back into multiple statements is to cloud their intent and take away their expressiveness. Using multiple statements requires introducing auxiliary, state-changing variables that remain visible longer than necessary. State changing variables are a classic source of programming errors. In contrast, the examples above are clean and show their correctness without having to mentally decrypt them. Scanning through the sort examples in the standard library, I see that the multi-line, statement form is sometimes further clouded by having a number of statements in-between. In SimpleHTTPServer.py, for example list = os.listdir(path) . . . (yada, yada) list.sort(key=lambda a: a.lower()) . . . (yada, yada, yada) for name in list: . . . You see other examples using os.environ and such. The forces working against introducing an in-line sort are: * the time to copy the list (which Alex later showed to be irrelevant), * having two list methods with a similar purpose, and * the proposed method names are less than sublime If someone could come-up with a name more elegant than "copysort", I the idea would be much more appetizing. Raymond Hettinger

On Sunday 19 October 2003 07:23 pm, Raymond Hettinger wrote: ...
Good summary (including the parts I snipped).
If someone could come-up with a name more elegant than "copysort", I the idea would be much more appetizing.
I still think that having it in some module is a bit better than having it as a method of lists. The BDFL has already Pronounced that it's too narrow in applicability for a builtin (and he's right -- as usual), and that we won't have "grab-bag" module of shortcuts that don't fit well anywhere else (ditto), and seems very doubtful despite your urgings to reconsider his stance against adding it as a list method (the two-methods-similar- purpose issue seems quite relevant). So, back to what I see as a key issue: a module needs to be "about" something reasonably specific, such as a data type. Built-in data types have no associated module except the builtin one, which is crowded and needs VERY high threshold for any addition. So, if I came up with an otherwise wonderful function that works on sets, arrays, ..., I'd be in clover -- there's an obvious module to house it)... but if the function worked on lists, dicts, files, ..., I'd be hosed. Note that module string STILL exists, and still is the ONLY way to call maketrans, an important function that was deemed inappropriate as a string method; a function just as important, and just as inappropriate as a method, that worked on lists, or dicts, or files, or slices, or ... would be "homeless" and might end up just not entering the standard library. In a way this risks making built-in types "second-class citizens" when compared to types offered by other modules in the standard library! I think we SHOULD have modules corresponding to built-in types, if there are important functions connected with those types but not appropriate as methods to populate them. Perhaps we could use the User*.py modules for the purpose, but making new ones seems better. Rather than being kept together just by naming conventions, as the User*.py are, they might be grouped in a package. Good names are always a problem, but, say, "tools.lists" might be the modules with the auxiliary tools dealing with lists, if "tools" was the package name -- "tools.dicts", "tools.files", etc, if needed -- "tools.sequences" for tools equally well suited to all sequences (not just lists) -- of course, that would retroactively suggest "tools.iters" for your itertools, oh well, pity, we sure can't rename it breaking backwards compatibility:-). If we had module tools.lists (or utils.lists, whatever) then I think copysort (by whatever name) would live well there. copyreverse and copyextend might perhaps also go there and make Barry happy?-) Alternatively - we could take a different tack. copysort is NOT so much a tool that works on an existing list -- as shown in the code I posted, thanks to PySequence_List, it's just as easy to make it work on any sequence (finite iterator or iterable). So what does it do? It BUILDS a new list object (a sorted one) from any sequence. So -- it's a FACTORY FUNCTION of the list type. Just like, say, dict.fromkeys is a factory function of the dict type. Now, factory functions are "by nature" classmethods of their type object, no? So, we should package THIS factory function just like others -- as a classmethod on list, list.somename, just like dict.fromkeys is a classmethod on dict. In this light, we surely don't want "copy" as a part of the name -- a factory method should be thought of as building a new list, not as copying an old one (particularly because it will work on any sequence as its argument, of course). Maybe list.buildsorted if we want to emphasize the "build" part. Or list.newsorted to emphasize that a new object is returned. Or maybe, like in dict.fromkeys, we don't want to emphasize either the building or the newness, but then I wouldn't know what to suggest except the list.sorted that's already drawn catcalls (though it drew them when it was proposed as an instance methods of lists -- maybe as a classmethod it will look better?-) I want the functionality -- any sensible name that might let the functionality into the standard library would be ok by me (so would one putting the functionality in as a builtin or as an instance method of lists, actually, but I _do_ believe those would not be the best places for this functionality, by far). I hope the "tools package" idea and/or the classmethod one find favour...!-) Alex

list.sorted as a list factory looks fine to me. Maybe whoever pointed out the problem with l.sorted() vs. l.sort() for non-native-English speakers can shed some light on how list.sorted(x) fares compared to x.sort()? But the argument that it wastes a copy still stands (even though that's only O(N) vs. O(N log N) for the sort).
I'm still unclear why this so important to have in the library when you can write it yourself in two lines. --Guido van Rossum (home page: http://www.python.org/~guido/)

On Monday, October 20, 2003, at 01:22 PM, Guido van Rossum wrote:
Probably "there should only be one way to do something." It's something that is recreated over and over, mostly the same way but sometimes with slight differences (e.g., copy-and-sort versus sort-in-place). Like dict() growing keyword arguments, a copy/sort method (function, classmethod, whatever) will standardize something that is very commonly reimplemented. Another analogs might be True and False (which before being built into Python may have been spelled true/false, TRUE/FALSE, or just 0/1). These don't add any real features, but they standardize these simplest of idioms. I think I've seen people in this thread say that they've written Big Python Programs, and they didn't have any problem with this -- but this is a feature that's most important for Small Python Programs. Defining a sort() function becomes boilerplate when you write small programs. Or alternatively you create some util module that contains these little functions, which becomes like a site.py only somewhat more explicit. A util module feels like boilerplate as well, because it is a module without any conceptual integrity, shared between projects only for convenience, or not shared as it grows organically. "from util import sort" just feels like cruft-hiding, not real modularity. -- Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org

That's one of the best ways I've seen this formulated. If Alex's proposal to have list.sorted() as a factory function is acceptable to the non-English-speaking crowd, I think we can settle on that. (Hm, an alternative would be to add a "sort=True" keyword argument to list()...) --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum writes:
(Hm, an alternative would be to add a "sort=True" keyword argument to list()...)
My immediate expectation on seeing that would be that the keyword args for l.sort() would also be present. It feels better to isolate that stuff; keeping list.sorted(...) make more sense I think. -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> PythonLabs at Zope Corporation

At 12:24 PM 10/20/03 -0700, Guido van Rossum wrote:
Does this extend by analogy to other requests for short functions that are commonly reimplemented? Not that any spring to mind at the moment; it just seems to me that inline sorting is one of a set of perennially requested such functions or methods, where the current standard answer is "but you can do it yourself in only X lines!".
Wouldn't it need to grow key and cmpfunc, too?

Only if there's some quirk to reimplementing them correctly, and only if the need is truly common. Most recently we did this for sum().
Yes, but list.sorted() would have to support these too. It might become slightly inelegant because we'd probably have to say that sorted defaults to False except it defaults to True if either of cmp, and key is specified. Note that reverse=True would not imply sorting, so that list(range(10), reverse=True) would yield [9, 8, 7, 6, 5, 4, 3, 2, 1, 0] But Raymond has a different proposal in mind for that (he still needs to update PEP 322 though). So maybe list.sorted() is better because it doesn't lend itself to such generalizations (mostly because of the TOOWTDI rule). --Guido van Rossum (home page: http://www.python.org/~guido/)

Let's see what the use cases look like under the various proposals: todo = [t for t in tasks.copysort() if due_today(t)] todo = [t for t in list.sorted(tasks) if due_today(t)] todo = [t for t in list(tasks, sorted=True) if due_today(t)] genhistory(date, events.copysort(key=incidenttime)) genhistory(date, list.sorted(events, key=incidenttime)) genhistory(date, list(events, sorted=True, key=incidenttime)) for f in os.listdir().copysort(): . . . for f in list.sorted(os.listdir()): . . . for f in list(os.listdir(), sorted=True): . . . To my eye, the first form reads much better in every case. It still needs a better name though. [Phillip J. Eby in a separate note]
Wouldn't it need to grow key and cmpfunc, too?
Now, that "key" and "reverse" are available, there is no need for "cmp" in any new methods. [Guido in a separate note]
I'll get to it soon; there won't be any surprises. Raymond Hettinger ################################################################# ################################################################# ################################################################# ##### ##### ##### ################################################################# ################################################################# #################################################################

On Monday 20 October 2003 11:43 pm, Raymond Hettinger wrote:
You're forgetting the cases in which (e.g.) tasks is not necessarily a list, but any finite sequence (iterable or iterator). Then. e.g. the first job becomes: todo = [t for t in list(tasks).copysort() if due_today(t)] todo = [t for t in list.sorted(tasks) if due_today(t)] todo = [t for t in list(tasks, sorted=True) if due_today(t)] and I think you'll agree that the first construct isn't that good then (quite apart from the probably negligible overhead of an unneeded copy -- still, we HAVE determined that said small overhead needs to be paid sometimes, and needing to code list(x).copysort() when x is not a list or you don't KNOW if x is a list adds one copy then).
Sorry, but much as I dislike cmpfunc it's still opportune at times, e.g. I'd rather code: def Aup_Bdown(x, y): return cmp(x.A, y.A) or cmp(y.B, x.B) for a in list.sorted(foo, cmp=Aup_Bdown): ... than for a in list.sorted( list.sorted(foo, key=lambda x:x.B, reverse=True), key=lambda x: x.A): ... or even for a in list(foo).copysort(key=lambda x:x.B, reverse=True ).copysort(key=lambda x: x.A): ... Alex

On Mon, 2003-10-20 at 22:43, Raymond Hettinger wrote:
Well, #3 is (I hope) a non-starter, given the need for the extra sort keyword arguments. And the instance method is less capable - it can't sort a non-list iterable (except via list(xxx).copysort()). So I would definitely prefer #2, especially as I would tend to put: sort = list.sorted at the top of my modules where needed. Then I'd have: todo = [t for t in sort(tasks) if due_today(t)] genhistory(date, sort(events, key=incidenttime)) for f in sort(os.listdir()): . . . which to me looks enough like pseudocode that I'm happy. This might seem like an argument for having sort() as a builtin, but I think it's still better as a list constructor. Adding "sort = list.sorted" to the modules that need it is a small price to pay in boilerplate for the big win of not cluttering the builtin namespace. Mark Russell

Really? That would seem to just obfuscate things for the reader (who would have to scroll back potentially many pages to find the one-line definition of sort). Why be so keen on saving 7 keystrokes? How many calls to list.sorted do you expect to have in your average module? --Guido van Rossum (home page: http://www.python.org/~guido/)

On Mon, 2003-10-20 at 23:49, Guido van Rossum wrote:
I think most readers would probably be able to guess what for key in sort(d.keys()): would do. If not then it's no worse than a user-defined function. It's also a matter of proportion -- the important thing about the code above is that it's walking over a dictionary. In most of my uses, the sort() is just a detail to ensure reproducible behaviour. In a new language I think you could make a case for the default behaviour for dict iteration to be sorted, with a walk-in-unspecified-order method for the cases where the speed really does matter. Back in the real world, how about: for key, value in d.sort(): (i.e. adding a sort() instance method to dict equivalent to: def sort(d, cmp=None, key=None, reverse=False): l = list(d.items()) l.sort(cmp, key, reverse) return l ). At least there's no question of an in-place sort for dicts!
Why be so keen on saving 7 keystrokes?
It's not totally trivial - for me a list comprehension is noticeably less readable when split over more than one line.
How many calls to list.sorted do you expect to have in your average module?
Er, about 0.3 :-) In the project I'm working on, there are 52 sortcopy() calls in 162 modules (about 18K LOC). Not enough to justify a built-in sort(), but enough I think to make list.sorted() worthwhile. Mark Russell

On Tuesday 21 October 2003 12:31 pm, Mark Russell wrote:
Incidentally, for k in list.sorted(d): will be marginally faster, e.g. (using the copysort I posted here, without The Trick -- it should be just about identical to the list.sorted classmethod): import copysort x = dict.fromkeys(map(str,range(99999))) def looponkeys(x, c=copysort.copysort): for k in c(x.keys()): pass def loopondict(x, c=copysort.copysort): for k in c(x): pass [alex@lancelot ext]$ timeit.py -c -s'import t' 't.loopondict(t.x)' 10 loops, best of 3: 2.84e+05 usec per loop [alex@lancelot ext]$ timeit.py -c -s'import t' 't.looponkeys(t.x)' 10 loops, best of 3: 2.67e+05 usec per loop i.e., about 10% better for this size of list and number of runs (quite a few, eyeball x.keys()...:-). Nothing crucial, of course, but still. Moreover, "list.sorted(d)" and "sort(d.keys())" are the same length, and the former is conceptually simpler (one [explicit] method call, vs one method call and one function call). Of course, if each keystroke count, you may combine both "abbreviations" and just use "sort(d)".
for key, value in d.sort():
(i.e. adding a sort() instance method to dict equivalent to:
Why should multiple data types acquire separate .sort methods with subtly different semantics (one works in-place and returns None, one doesn't mutate the object and returns a list, ...) when there's no real added value wrt ONE classmethod of list...? Particularly with cmp, key, and reverse on each, seems cumbersome to me. Truly, is list.sorted(d.iteritems()) [or d.items() if you'd rather save 4 chars than a small slice of time:-)] SO "unobvious"? I just don't get it. Alex

On Tue, 2003-10-21 at 12:00, Alex Martelli wrote:
I agree that the different semantics for lists and dicts are a strike against this. The argument for it is that walking over a dictionary in sorted order is (at least to me) a missing idiom in python. Does this never come up when you're teaching the language? I wouldn't advocate adding this to other types (e.g. Set) because they're much less commonly used than dicts, so I don't think there's a danger of a creeping plague of sort methods. Not a big deal though - list.sorted() is the real win. Mark Russell PS: I'm really not an anal-retentive keystoke counter :-)

On Tuesday 21 October 2003 01:18 pm, Mark Russell wrote:
It's a frequently used idiom (actually more than one) -- it's not "missing".
never come up when you're teaching the language?
Sure, and I have a good time explaining that half the time you want to sort on KEYS and half the time on VALUES. An example I often use is building and displaying a word-frequency index: now it's pretty obvious that you may want to display it just as easily by frequency (most frequent words first) OR alphabetically. The key= construct IS a huge win, btw. I just wish there WAS an easier way to express the TYPICAL keys one wants to use than lambda x: x[N] for some N or lambda x: x.A for some A. getattr and operator.getitem are no use, alas, even when curried, because they take x first:-(. I'd rather not teach lambda (at least surely not early on!) so I'll end up with lots of little def's (whose need had sharply decreased with list comprehensions, as map and filter moved into a corner to gather dust). Ah well.
I wouldn't advocate adding this to other types (e.g. Set) because they're much less commonly used than dicts, so I don't think there's a
Actually, I was thinking of presenting them BEFORE dicts next time I have an opportunity of teaching Python from scratch. The ARE simpler and more fundamental, after all.
danger of a creeping plague of sort methods. Not a big deal though - list.sorted() is the real win.
I concur.
Mark Russell
PS: I'm really not an anal-retentive keystoke counter :-)
OK, sorry for the digs, it just _looked_ that way for a sec;-). Alex

Mark Russell <marktrussell@btopenworld.com>:
Maybe dicts should have a .sortedkeys() method? The specialised method name would help stave off any temptation to add varied sort methods to other types. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+

Did the discussion of a sort() expression get resolved? The last I remember was that the list.sorted() classmethod had won the most support because it accepted the broadest range of inputs. I could live with that though I still prefer the more limited (list-only) copysort() method. Raymond Hettinger

list.sorted() has won, but we are waiting from feedback from the person who didn't like having both sort() and sorted() as methods, to see if his objection still holds when one is a method and the other a factory function. --Guido van Rossum (home page: http://www.python.org/~guido/)

On Wed, Oct 22, 2003, Guido van Rossum wrote:
Actually, I was another person who expressed dislike for "sorted()" causing confusion, but previous calls for feedback were restricted to those who felt comfortable expressing opinions for non-English speakers. I'm still -1 on sorted() compared to copysort(), but because it's a different context, I'm no longer actively opposed (which is why I didn't bother speaking up earlier). I still think that a purely grammatical change in spelling is not appropriate to indicate meaning, particularly when it's still the same part of speech (both verbs). To my mind, sorted() implies a Boolean predicate. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "It is easier to optimize correct code than to correct optimized code." --Bill Harlan

On Wednesday 22 October 2003 08:53 pm, Guido van Rossum wrote:
So, if I've followed correctly the lots of python-dev mail over the last few days, that person (Aahz) is roughly +0 on list.sorted as classmethod and thus we can go ahead. Right? Alex

On Sat, Oct 25, 2003, Alex Martelli wrote:
I'm not the person who objected on non-English speaking grounds, and I'm -0 because I don't like using grammatical tense as the differentiator; as I said, I'd expect sorted() to be a predicate. If we're doing this (and it seems we are), I still prefer copysort() for clarity. But I'm not objecting to sorted(). -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "It is easier to optimize correct code than to correct optimized code." --Bill Harlan

[Alex]
[Aahz]
Predicates start with 'is'. For example, s.lower() converts s to lowercase; s.islower() asks if s is lowercase. I'm -1 on list.copysort() as a constructor/factory. Since whoever didn't like sorted() before hasn't piped up now, I think we should go ahead and implement the list.sorted() constructor. --Guido van Rossum (home page: http://www.python.org/~guido/)

Since whoever didn't like sorted() before hasn't piped up now, I think we should go ahead and implement the list.sorted() constructor.
Okay, I'll modify the patch to be a classmethod called sorted() and will assign to Alex for second review. Raymond

If we're doing this (and it seems we are), I still prefer copysort() for clarity.
"copysort" sounds like the name of some weird sorting algorithm to me. I'd prefer "sortedcopy" (although I suppose that could be read as a predicate, too -- "is x a sorted copy of y?") Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+

Okay, this is the last chance to come-up with a name other than sorted(). Here are some alternatives: inlinesort() # immediately clear how it is different from sort() sortedcopy() # clear that it makes a copy and does a sort newsorted() # appropriate for a class method constructor I especially like the last one and all of them provide a distinction from list.sort(). Raymond Hettinger

Hi, thought you might be interested in the opinions of someone not (yet) working full-day with Python and whose mother tounge is *not* english. "Raymond Hettinger" <python@rcn.com> schrieb
Okay, this is the last chance to come-up with a name other than sorted().
The method is making a copy and sorts that and returns it, right? I think the copy is not fully clear from this name. I'd give it +0
Here are some alternatives:
inlinesort() # immediately clear how it is different from sort()
I'm rather -1 on it. Inline might be confused with inplace, and even when not it's not clear from the name that a copy is made.
sortedcopy() # clear that it makes a copy and does a sort
My favourite (if the behaviour is how I believe it, that is *only* the copy is sorted) It's really obvious what is done. +1
newsorted() # appropriate for a class method constructor
I first read this news-orted, and had to step back. Also "new" is not actually the same than "copy" to me (maybe because of my C++) background. Say -0 hth Werner

On Tuesday 28 October 2003 03:01 pm, Phillip J. Eby wrote:
Please explain how it might work when the argument to list.sortedcopy is *NOT* an instance of type list, but rather a completely general sequence, as a classmethod will of course allow us to have. Maybe I'm missing some recent Python enhancements, but I thought that, if a method is fully usable as an instancemethod, then when called on the type it's an unbound method and will ONLY support being called with an instance of the type as the 1st arg. Hmmm... maybe one COULD make a custom descriptor that does support both usages... and maybe it IS worth making the .sorted (or whatever name) entry a case of exactly such a subtle custom descriptor... Alex

Thanks for the idea, I can use this as a perverted example in my talk at Stanford tomorrow. Here it is: import new def curry(f, x, cls=None): return new.instancemethod(f, x) class MagicDescriptor(object): def __init__(self, classmeth, instmeth): self.classmeth = classmeth self.instmeth = instmeth def __get__(self, obj, cls): if obj is None: return curry(self.classmeth, cls) else: return curry(self.instmeth, obj) class MagicList(list): def _classcopy(cls, lst): obj = cls(lst) obj.sort() return obj def _instcopy(self): obj = self.__class__(self) obj.sort() return obj sorted = MagicDescriptor(_classcopy, _instcopy) class SubClass(MagicList): def __str__(self): return "SubClass(%s)" % str(list(self)) unsorted = (1, 10, 2) print MagicList.sorted(unsorted) print MagicList(unsorted).sorted() print SubClass.sorted(unsorted) print SubClass(unsorted).sorted() --Guido van Rossum (home page: http://www.python.org/~guido/)

Oops, remnant of a dead code branch.
Right. I had bigger plans but decided to can them. :-) --Guido van Rossum (home page: http://www.python.org/~guido/)

[Guido's code]
Notwithstanding the "perverted" implementation, Alex's idea is absolutely wonderful and addresses a core usability issue with classmethods. If only in the C API, I would like to see just such a universalmethod alternative to classmethod. That would allow different behaviors to be assigned depending on how the method is called. Both list.sort() and dict.fromkeys() would benefit from it: class MagicDict(dict): def _class_fromkeys(cls, lst, value=True): "Make a new dict using keys from list and the given value" obj = cls() for elem in lst: obj[elem] = value return obj def _inst_fromkeys(self, lst, value=True): "Update an existing dict using keys from list and the given value" for elem in lst: self[elem] = value return self newfromkeys = MagicDescriptor(_class_fromkeys, _inst_fromkeys) print MagicDict.newfromkeys('abc') print MagicDict(a=1, d=2).newfromkeys('abc') An alternative implementation is to require only one underlying function and to have it differentiate the cases based on obj and cls: class MagicDict(dict): def newfromkeys(obj, cls, lst, value=True): if obj is None: obj = cls() for elem in lst: obj[elem] = value return obj newfromkeys = universalmethod(newfromkeys) Raymond Hettinger

I'm not so sure. I think the main issue is that Python users aren't used to static methods; C++ and Java users should be familiar with them and I don't think they cause much trouble there.
But your _inst_fromkeys mutates self! That completely defeats the purpose (especially since it also returns self) and I am as much against this (approx. -1000 :-) as I am against sort() returning self. To me this pretty much proves that this is a bad idea; such a schizo method will confuse users more that a class method that ignores the instance. And if you made an honest mistake, and meant to ignore the instance, it still proves that this is too confusing to do! :-) --Guido van Rossum (home page: http://www.python.org/~guido/)

[GvR]
But your _inst_fromkeys mutates self!
That issue wasn't intrinsic to the proposal. The implementation should have read: class MagicDict(dict): def newfromkeys(obj, cls, lst, value=True): "Returns a new MagicDict with the keys in lst set to value" if obj is not None: cls = obj.__class__ newobj = cls() for elem in lst: newobj[elem] = value return newobj newfromkeys = universalmethod(newfromkeys) Universal methods give the method a way to handle the two cases separately. This provides both the capability to make an instance from scratch or to copy it off an existing instance. Your example was especially compelling: a = [3,2,1] print a.sorted() print list.sorted(a) Raymond Hettinger

On Thursday 30 October 2003 06:49 am, Raymond Hettinger wrote: ...
Actually, yes, it IS compelling indeed. Funny -- I was originally just brainstorming / musing out loud, never thought about this as a "real thing". But now that it's matured a bit, I do feel sure -- from past experience with what puzzles Python newbies depending on their previous programming knowledge or lack thereof -- that if we had this it would *seriously* reduce the number of puzzlements we have to solve on c.l.py or help@python.org. Which IS strange, in a way, because I do not know of any existing language that has exactly this kind of construct -- a factory callable that you can call on either a type or an instance with good effect. Yet despite it not being previously familiar it DOES "feel natural". Of course, the 3 lines above would also work if sorted was an ordinary instancemethod, but that's just because a is a list instance; if we had some other sequence, say a generator expression, print list.sorted(x*x for x in a) would be just as sweet, and _that_ is the compelling detail IMHO. Trying to think of precedents: Numeric and gmpy have quite a few things like that, except they're (by necessity of the age of gmpy and Numeric) ordinary module functions AND instance methods. E.g.:
as a module function, sqrt is the truncating integer square root, which is also a method of mpz instances (mpz being the integer type in gmpy). mpf (the float type in gmpy) has a sqrt method too, which is nontruncating -- that is also a module function, but, as such, it needs to be called fsqrt (sigh). I sure _would_ like to expose the functionality as mpf.sqrt(x) and mpz.sqrt(x) [would of course be more readable with other typenames than those 'mpf' and 'mpz', but that's another issue, strictly a design mistake of mine]. Alex

If you feel about it that way, I recommend that you let it mature a bit more. If you really like this so much, please realize that you can do this for *any* instance method. The identity C.foo(C()) == C().foo() holds for all "regular" methods. (Since 2.2 it also holds for extension types.) If we were to do this, we'd be back at square two, which we rejected: list instances having both a sort() and a sorted() method (square one being no sorted() method at all :-). --Guido van Rossum (home page: http://www.python.org/~guido/)

On Thursday 30 October 2003 04:43 pm, Guido van Rossum wrote:
Yes, having a be an instance of list in the above doesn't show 'sorted' as being any different than a perfectly regular instance method -- it WAS in this sense a bad example (I thought I'd mentioned that later on in the very same post?). The identify I want is rather like: C.foo(x) == C(x).foo() for an x that's not necessarily an instance of C, just something that has a natural way to become one. When C is list, any iterable x, for example. In other words, being able to call C.foo(x) _without_ the typechecking constraint that x is an instance of C, as one would have for a normal C.foo unbound-method.
Yes, the names are an issue again -- but having e.g. x=L1.sorted(L2) completely ignore the value of L1 feels a bit strange to me too (as does x=D1.fromkeys(L3) now that I've started thinking about it -- I've never seen any Python user, newbie or otherwise, have actual problems with this, but somebody on c.l.py claimed that's just because "nobody" knows about fromkeys -- so, I dunno...). Darn -- it WOULD be better in some cases if one could ONLY call a method on the class, NOT on an instance when the call would in any case ignore the instance. Calling dict.fromkeys(L3) is wonderful, the problem is that you can also call it on a dict instance, and THAT gets confusing. Similarly, calling list.sorted(iterable) is wonderful, but calling it on a list instance that gets ignored, L1.sorted(iterable), could perhaps be confusing. Yeah, the C++(staticmethod)/Smalltalk(clasmethod) idea of "call it on the instance" sounds cool in the abstract, but when it comes down to cases I'm not so sure any more -- what might be use cases where it's preferable, rather than confusing, to be able to call aninst.somestaticmethod(x,y) "just as if" it was a normal method? Would it be such an imposition to "have to know" that a method is static and call type(aninst).somestaticmethod(x,y) instead, say...? Oh well, I guess it's too late to change the semantics of the existing descriptors, even if one agrees with my newfound doubts. But the funniest thing is that I suspect the _new_ descriptor type would be the _least_ confusing of them, because calling such a method on class or instance would have semantics much closer to ordinary methods, just slightly less typeconstrained. Oh well! Alex

Let's focus on making this an issue that one learns without much pain. Given that the most common mistake would be to write a.sorted(), and that's a TypeError because of the missing argument, perhaps we could make the error message clearer? Perhaps we could use a variant of classmethod whose __get__ would raise the error, rather than waiting until the call -- it could do the equivalent of the following: class PickyClassmethod(classmethod): def __get__(self, obj, cls): if obj is not None: raise TypeError, "class method should be called on class only!" else: return classmethod.__get__(self, None, cls) I don't want to make this behavior the default behavior, because I can see use cases for calling a class method on an instance too, knowing that it is a class method; otherwise one would have to write the ugly x.__class__.foobar(). --Guido van Rossum (home page: http://www.python.org/~guido/)

At 09:16 AM 10/30/03 -0800, Guido van Rossum wrote:
If there were a 'classonlymethod()' built-in, I'd probably use it, as I use classmethods a fair bit (mostly for specialized constructors), but I don't recall ever desiring to call one via an instance. Do you have an example of the use cases you see? Hm. What if your PickyClassmethod were a built-in called 'constructor' or 'factorymethod'? Probably too confining a name, if there are other uses for class-only methods, I suppose.

Not exactly, but I notice that e.g. UserList uses self.__class__ a lot; I think that's the kind of area where it might show up.
I'm not convinced that we have a problem (beyond Alex lying awake at night, that it :-). --Guido van Rossum (home page: http://www.python.org/~guido/)

At 10:00 AM 10/30/03 -0800, Guido van Rossum wrote:
I thought you were proposing to use it for list.sorted, in order to provide a better error message when used with an instance. If such a descriptor were implemented, I was suggesting that it would be useful as a form of documentation (i.e. that a method isn't intended to be called on instances of the class), and thus it would be nice for it to be exposed for folks like me who'd take advantage of it. (Especially if PEP 318 is being implemented.)

I mostly just proposed it to placate Alex; I think he's overly worried in this case. PEP 318 seems a ways off. --Guido van Rossum (home page: http://www.python.org/~guido/)

Both. This is the kind of syntactic change that require much deep thought before committing. Unfortunately I don't have time for that right now, so please don't ask. --Guido van Rossum (home page: http://www.python.org/~guido/)

On Thu, 2003-10-30 at 15:18, Guido van Rossum wrote:
I won't, but I do hope this is something that we can settle for Python 2.4. I've been using the functionality in Python 2.3 for a while now and it is wonderful, but I the tedium and clumsiness of the current syntax really puts a damper on its use. -Barry

On Thursday 30 October 2003 09:55 pm, Barry Warsaw wrote:
Not on mine (my use), but, yes, I _have_ seen some Pythonistas be rather perplexed by it. Giving it a neat, cool look will be good. BTW, when we do come around to PEP 318, I would suggest the 'as' clause on a class statement as the best way to specify a metaclass. 'class Newstyle as type:' for example is IMHO neater -- and thus more encouraging to the generalized use of newstyle classes -- than the "inheriting from object" idea or the setting of __metaclass__; it reads well AND makes what one's doing more obvious when a custom MC is involved, because it's so "up-front". Besides, it's STILL syntax for a call to the thingy on the RHS of 'as', just like, say, def foop() as staticmethod: is, even though the details of how that call is performed are different for metaclasses (called with classname/bases/classdict) and function decorators (called with the function object). BTW, the PEP isn't very clear about this, but, I would hope the 'as' clause applies uniformly to ANY def or class statement, right? No reason to specialcase, that I can see -- "def ... as" may well be used mostly inside classbodies, because we do have decorators ready for that, but the 'synchronized(lock)' decorator used in the PEP's examples would seem just as applicable to freestanding functions as to methods. Alex

On Thursday 30 October 2003 07:19 pm, Guido van Rossum wrote:
I'm not convinced that we have a problem (beyond Alex lying awake at night, that it :-).
As it happens I just had a very unusual ten-hours-of-sleep night, so I don't think you need to worry:-).
OK, then it does appear to me that new descriptors may wait for PEP 318 to mature, and list.sorted be left as is for now. Hopefully both can be taken into consideration before 2.4 is finalized, since that time is also "a ways off", no doubt. Alex

Then why don't you use a custom descriptor which raises an exception when an instance is passed in? Like: def __get__(self, obj, cls): if obj is None: return new.instancemethod(self.classmeth, cls) else: raise TypeError, \ "Calling %s on instance %s ignores instance" % \ (self.classmeth, obj) -- Christian Tanzer http://www.c-tanzer.at/

Well, I'd like to withdraw it. Having all three of a.sort(), a.sorted() and list.sorted(a) available brings back all the confusion of a.sort() vs. a.sorted(). What's in CVS is just fine. --Guido van Rossum (home page: http://www.python.org/~guido/)

On Thursday 30 October 2003 06:31 am, Guido van Rossum wrote:
"Yes, but". The ability to call something on either the class OR the instance IS a Python hallmark... with the constraint that when you call it on the class you need to provide an instance as the first arg (assuming the "something" is a normal instance method, which is surely the most frequent case). You could see universalmethods as being special only in that they WEAKEN this constraint -- they let the 1st arg be EITHER an instance OR something from which a new instance can be naturally constructed. Use cases: in gmpy: if I had universal methods, current instancemethods mpf.sqrt and mpz.sqrt (multiprecision floatingpoint and integer/truncating square roots respectively) could also be called quite naturally as mpf.sqrt(33) and mpz.sqrt(33) respectively. Right now you have to use, instead, mpf(33).sqrt() or mpz(33).sqrt(), which is QUITE a bit more costly because the instance whose sqrt you're taking gets built then immediately discarded (and building mpf -- and to a lesser extent mpz -- instances is a bit costly); OR you can call module functions gmpy.sqrt(33), truncating sqrt, or gmpy.fsqrt(33), nontruncating (returning a multiprecision float). Just one example -- gmpy's chock full of things like this, which universalmethod would let me organize a bit better. in Numeric: lots of module-functions take an arbitrary iterable, build an array instance from it if needed, and operate on an array instance to return a result. This sort-of-works basically because Numeric has "one main type" and thus the issue of "which type are we talking about" doesn't arise (gmpy has 3 types, although mpz takes the lion's share, making things much iffier). But still, Numeric newbies (if they come from OO languages rather than Fortran) DO try calling e.g. x.transpose() for some array x rather than the correct Numeric.transpose(x) -- and in fact array.transpose, called on the class, would ALSO be a perfeclty natural approach. universalmethod would allow array instances to expose such useful functionality as instance methods AND also allow applying direct operations -- without costly construction of intermediate instances to be thrown away at once -- via "array.transpose" and the like. It's not really about new revolutionary functionality: it's just a neater way to "site" existing functionality. This isn't surprising if you look at universalmethod as just a relaxation of the normal constraint "when you call someclass.somemethod(x, ... -- x must be an instance of someclass" into "x must be an instance of someclass OR -- if the somemethod supports it -- something from which such an instance could be constructed in one obvious way". Then, basically, the call is semantically equivalent to someclass(x).somemethod(... BUT the implementation has a chance to AVOID costly construction of an instance for the sole purpose of calling somemethod on it and then throwing away the intermediate instance at once. No revolution, but, I think, a nice little addition to our armoury. Alex

While we're voting, I still like list.sorted() best, so please keep that one in the list of possibilities. --Guido van Rossum (home page: http://www.python.org/~guido/)

On Tuesday 28 October 2003 04:18 pm, Guido van Rossum wrote:
I also like list.sorted() -- but list.newsorted() is IMHO even a LITTLE bit better, making it even clearer that it's giving a NEW list. Just a _little_ preference, mind you. "sortedcopy" appears to me a BIT less clear (what "copy", if the arg isn't a list...?), "inlinesort" worst. BTW, I think I should point out one POSSIBLE problem with classmethods -- since unfortunately they CAN be called on an instance, and will ignore that instance, this may confuse an unsuspecting user. I was arguing on c.l.py that this _wasn't_ confusing because I never saw anybody made that mistake with dict.fromkeys ... and the response was "what's that?"... i.e., people aren't making mistakes with it because they have no knowledge of it. list.newsorted or however it's going to be called is probably going to be more popular than existing dict.fromkeys, so the issue may be more likely to arise there. Although I think the issue can safely be ignored, I also think I should point it out anyway, even just to get concurrence on this -- it IS possible that the classmethod idea is winning "by accident" just because nobody had thought of the issue, after all, and that would be bad (and I say so even if I was the original proposer of the list.sorted classmethod and still think it should be adopted -- don't want it to get in "on the sly" because a possible problem was overlooked, as opposed to, considered and decided to be not an issue). OK, so here's the problem, exemplified with dict.fromkeys: d = {1:2, 3:4, 5:6} dd = d.fromkeys([3, 5]) it's not immediately obvious that the value of d matters not a whit -- that this is NOT going to return a subset of d "taken from the keys" 3 and 5, i.e. {3:4, 5:6}, but, rather, {3:None, 5:None} -- and the functionality a naive user might attribute to that call d.fromkeys([3, 5]) should in fact be coded quite differently, e.g.: dd = dict([ (k,v) for k, v in d.iteritems() if k in [3,5] ]) or perhaps, if most keys are going to be copied: dd = d.copy() for k in d: if k not in [3, 5]: del dd[k] The situation with list.sorted might be somewhat similar, although in fact I think that it's harder to construct a case of fully sympathizable-with confusion like the above. Still: L = range(7) LL = L.sorted() this raises an exception (presumably about L.sorted needing "at least 1 arguments, got 0" -- that's what d.fromkeys() would do today), so the issue ain't as bad -- it will just take the user a while to understand WHY, but at least there should be running program with strange results, which makes for harder debugging. Or: L = range(7) LL = L.sorted(('fee', 'fie', 'foo')) I'l not sure what the coder might expect here, but again it seems possible that he expected the value of L to matter in some way to the resulting value of LL. Perhaps this points to an issue with classmethods in general, due in part to the fact that they're still rather little used in Python -- callers of instance.method() may mostly expect that the result has something to do with the instance's value, rather than being the same as type(instance).method() -- little we can do about it at this point except instruction, I think. Alex

Alex Martelli <aleaxit@yahoo.com> wrote:
IMO, sorted is the clearest, all other proposals carry excess baggage making them less clear.
Or put the method into the metaclass. I'm using both classmethods and methods defined by metaclasses and didn't get any complaints about classmethods yet. -- Christian Tanzer http://www.c-tanzer.at/

"Guido van Rossum" <guido@python.org> wrote in message news:200310281518.h9SFIj129025@12-236-54-216.client.attbi.com...
After thinking about it some more, I also prefer .sorted to suggested alternatives. I read it as follows: list(iterable) means 'make a list from iterable (preserving item order)' list.sorted(iterable) means 'make a sorted list from iterable' While I generally like verbs for method names, the adjective form works here as modifing the noun/verb 'list' and the invoked construction process. 'Inline' strikes me as confusing. 'Copy' and 'new' strike me as redundant noise since, in the new 2.2+ regime, 'list' as a typal verb *means* 'make a new list'. Terry J. Reedy Terry J. Reedy

On Mon, 2003-10-20 at 19:22, Guido van Rossum wrote:
But the argument that it wastes a copy still stands (even though that's only O(N) vs. O(N log N) for the sort).
That would be irrelevant in most of the cases where I would use it - typically sorting short lists or dicts where the overhead is unmeasurable.
I'm still unclear why this so important to have in the library when you can write it yourself in two lines.
For little standalone scripts it gets a bit tedious to write this again and again. It doesn't take much code to write dict.fromkeys() manually, but I'm glad that it's there. I'd say list.sorted (or whatever it gets called) has at least as much claim to exist. Mark Russell

Raymond Hettinger strung bits together to say:
'chain' may be a bad name then, since all that function really does is take an arbitrary bound method, execute it and then return the object that the method was bound to. If we used a name like 'method_as_expr' (instead of 'chain'), then the above examples would be: genhistory(date, method_as_expr(list(events).sort, key=incidenttime)) todo = [t for t in method_as_expr(list(tasks).sort) if due_today(t)] Granted, it's not quite as clear (some might say it's positively arcane!), but it also isn't using anything that's not already in the language/standard library.
Would something like 'sortedcopy' be an improvement? Although Alex's suggestion of a class method like dict.fromkeys() also sounded good - naming it is still an issue, though. I'm not entirely opposed to the idea (the 'method_as_expr' approach feels like something of a hack, even to me) - but the object method just doesn't seem to fit cleanly into the design of the basic types. Cheers, Nick. -- Nick Coghlan | Brisbane, Australia ICQ#: 68854767 | ncoghlan@email.com Mobile: 0409 573 268 | http://www.talkinboutstuff.net "Let go your prejudices, lest they limit your thoughts and actions."

[GvR]
[Aahz]
[Facundo]
+2, considering that the difference in behaviour with sort and sorted it's no so clear to a non-english speaker.
FWIW, I've posted a patch to implement list.copysort() that includes a news announcement, docs, and unittests: www.python.org/sf/825814 Raymond Hettinger

Despite my suggesting a better name, I'm not in favor of this (let's say -0). For one, this will surely make lots of people write for key in D.keys().copysort(): ... which makes an unnecessary copy of the keys. I'd rather continue to write keys = D.keys() keys.sort() for key in keys: ... --Guido van Rossum (home page: http://www.python.org/~guido/)

On Fri, Oct 17, 2003, Guido van Rossum wrote:
I'm actually -1, particularly with your clear argument; I just didn't like your suggestion of l.sorted(). -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "It is easier to optimize correct code than to correct optimized code." --Bill Harlan

On Sat, 2003-10-18 at 07:07, Brett C. wrote:
I'm -1 as well. Lists do not need to grow a method for something that only replaces two lines of code that are not tricky in any form of the word.
And don't forget that the trivial function will sort any iterable, not just lists. I think for member in copysort(someset): is better than for member in list(someset).copysort(): I'm against list.copysort(), and for either leaving things unchanged or adding copysort() as a builtin (especially if it can use the reference count trick to avoid unnecessary copies). Mark Russell

On Saturday 18 October 2003 11:44 am, Mark Russell wrote:
The trick would need to check that the argument is a list, of course, as well as checking that the reference to it on the stack is the only one around. But given this, yes, I guess a built-in would be "better" by occasionally saving the need to type a few extra characters (though maybe "worse" by enlarging the built-in module rather than remaining inside the smaller namespace of the list type...?). The built-in, or method, 'copysort', would have to accept the same optional arguments as the sort method of lists has just grown, of course. Alex

Wondering about the trick of copysort not copying a singly-referenced list I decided to try it out in a tiny extension module, and, yes, it is just as trivial as one might wish (haven't dealt with optional args to sort, just wanting to check performance &c): static PyObject* copysort(PyObject* self, PyObject* args) { PyObject *sequence, *listresult; if(!PyArg_ParseTuple(args, "O", &sequence)) return 0; if(PyList_CheckExact(sequence) && sequence->ob_refcnt==1) { listresult = sequence; Py_INCREF(listresult); } else { listresult = PySequence_List(sequence); } if(listresult) { if(PyList_Sort(listresult) == -1) { Py_DECREF(listresult); listresult = 0; } } return listresult; } and performance on an equally trivial testcase: x = dict.fromkeys(range(99999)) def looponsorted1(x): keys = x.keys() keys.sort() for k in keys: pass def looponsorted2(x, c=copysort.copysort): for k in c(x.keys()): pass turns out to be identical between the two _with_ The Trick (4.4e+04 usec with timeit.py -c on my box) while without The Trick copysort would slow down to about 5.5e+04 usec. But, this reminds me -- function filter, in bltinmodule.c, uses just about the same trick too (to modify in-place when possible rather than making a new list -- even though when it does make a new list it's an empty one, not a copy, so the gain is less). There must be other cases of applicability which just haven't been considered. So... Shouldn't The Trick be embodied in PySequence_List itself...? So, the whole small tricky part above: if(PyList_CheckExact(sequence) && sequence->ob_refcnt==1) { listresult = sequence; Py_INCREF(listresult); } else { listresult = PySequence_List(sequence); } would collapse to a single PySequence_List call -- *AND* potential calls from Python code such as "x=list(somedict.keys())" might also be speeded up analogously... [Such a call looks silly when shown like this, but in some cases one might not know, in polymorphic use, whether a method returns a new or potentially shared list, or other sequence, and a call to list() on the method's result then may be needed to ensure the right semantics in all cases]. Is there any hidden trap in The Trick that makes it unadvisable to insert it in PySequence_List? Can't think of any, but I'm sure y'all will let me know ASAP what if anything I have overlooked...;-). One might even be tempted to reach down all the way to PyList_GetSlice, placing THERE The Trick in cases of all-list slicing of a singly-referenced list (PyList_GetSlice is what PySequence_List uses, so it would also get the benefit), but that might be overdoing it -- and encouraging list(xxx) instead of xxx[:], by making the former a bit faster in some cases, would be no bad thing IMHO (already I'm teaching newbies to prefer using list(...) rather than ...[:] strictly for legibility and clarity, being able to mention possible future performance benefits might well reinforce the habit...;-). Alex

On Saturday 18 October 2003 02:26 pm, Alex Martelli wrote: ...oops...
x = dict.fromkeys(range(99999))
here, x.keys() IS already sorted, so the importance of The Trick is emphasized because the sort itself has little work to do:
I've changed the initialization of x to
x = dict.fromkeys(map(str,range(99999)))
so that x.keys() is not already sorted (still has several runs that the sort will exploit -- perhaps representative of some real-world sorts...;-) and the numbers change to about 240 milliseconds with The Trick (or with separate statements to get and sort the keys), 265 without -- so, more like 10% advantage, NOT 20%-ish (a list.copysort method, from Raymond's patch, has 240 milliseconds too -- indeed it's just about the same code I was using in the standalone function I posted, give or take some level of indirectness in C calls that clearly don't matter much here). Of course, the % advantage will vary with the nature of the list (how many runs that sort can exploit) and be bigger for smaller lists (given we're comparing O(N) copy efforts vs O(N log N) sorting efforts). Alex

I don't like the trick of avoiding the copy if the refcount is one; AFAIK it can't be done in Jython. I think the application area is too narrow to warrant a built-in, *and* lists shouldn't grow two similar methods. Let's keep the language small! (I know, by that argument several built-ins shouldn't exist. Well, they might be withdrawn in 3.0; let's not add more.) --Guido van Rossum (home page: http://www.python.org/~guido/)

I don't like the trick of avoiding the copy if the refcount is one; AFAIK it can't be done in Jython.
As Alex demonstrated, the time savings for an O(n) operation inside an O(n log n) function is irrelevant anyway.
Not to be hard headed here, but if dropped now, it will never be considered again. Did you have a chance to look at the rationale for change in my previous note and added in the comments for the patch? I think they offer some examples and reasons stronger than "saving a little typing": www.python.org/sf/825814 Raymond

I'm taking that into account. It still smells funny. --Guido van Rossum (home page: http://www.python.org/~guido/)

On Sat, 18 Oct 2003, Guido van Rossum wrote:
I don't like the trick of avoiding the copy if the refcount is one; AFAIK it can't be done in Jython.
There is also a problem with the strategy if if gets called by a C extension. It is perfectly feasible for a C extension to hold the only reference to an object, call the copying sort (directly or indirectly), and then be very surprised that the copy did not take place. -Kevin -- -- Kevin Jacobs The OPAL Group - Enterprise Systems Architect Voice: (440) 871-6725 x 19 E-mail: jacobs@theopalgroup.com Fax: (440) 871-6722 WWW: http://www.theopalgroup.com/

On Saturday 18 October 2003 08:00 pm, Kevin Jacobs wrote:
Alas, I fear you're right. Darn -- so much for a possible little but cheap optimization (which might have been neat in PySequence_List even if copysort never happens and the optimization is only for CPython -- I don't see why an optimization being impossible in Jython should stop CPython from making it, as long as semantics remain compatible). It's certainly possible for C code to call PySequence_List or whatever while holding the only reference, and count on the returned and argument objects being distinct:-(. Alex

On Saturday 18 October 2003 06:33 pm, Guido van Rossum wrote:
I don't like the trick of avoiding the copy if the refcount is one; AFAIK it can't be done in Jython.
No, but if it's only a small optimization, who cares? Anyway, the objection that these functions might be called by _C_ code who's holding the only reference to a PyObject* probably kills The Trick (particularly my hope of moving it into PySequence_List whether copysort survived or not).
Aye aye, captain. Can we dream of a standard library module of "neat hacks that don't really warrant a built-in" in which to stash some of these general-purpose, no-specific-appropriate-module, useful functions and classes? Pluses: would save some people reimplementing them over and over and sometimes incorrectly; would remove any pressure to add not-perfectly-appropriate builtins. Minuses: one more library module (the, what, 211th? doesn't seem like a biggie). Language unchanged -- just library. Pretty please?
(I know, by that argument several built-ins shouldn't exist. Well, they might be withdrawn in 3.0; let's not add more.)
"Amen and Hallelujah" to the hope of slimming language and built-ins in 3.0 (presumably the removed built-ins will go into a "legacy curiosa" module, allowing a "from legacy import *" to ease making old code run in 3.0? seems cheap & sensible). Alex

At 10:11 PM 10/18/03 +0200, Alex Martelli wrote:
Hmmm. import tricky.hacks from dont_try_this_at_home_kids import * I suppose 'shortcuts' would probably be a less contentious name. :) The downside to having such a module would be that it would entertain ongoing pressure to add more things to it. I suppose it'd be better to have a huge shortcuts module (or maybe shortcuts package, divided by subject matter) than to keep adding builtins.
I like it. Or, for symmetry, maybe 'from __past__ import lambda'. ;-) Say, in 3.0, will there be perhaps *no* builtins? After all, you don't need builtins to import things. Nah, that'd be too much like Java, and not enough like pseudocode. Ah well, time for me to stop making suggestions on what color to paint the bicycle shed, and start doing some real work today. :)

Modules should be about specific applications, or algorithms, or data types, or some other unifying principle. I think "handy" doesn't qualify. :-)
Let's not speculate yet about how to get old code to run in 3.0. --Guido van Rossum (home page: http://www.python.org/~guido/)

On Sat, 2003-10-18 at 12:31, Alex Martelli wrote:
The built-in, or method, 'copysort', would have to accept the same optional arguments as the sort method of lists has just grown, of course.
Yes. In fact one point it its favour is that there aren't any choices to be made - the interface should track that of list.sort(), so it avoids the usual objection to trivial functions that there are many possible variants. Mark Russell

[Raymond]
[Guido]
Interesting that you saw this at the same time I was fretting about it over dinner. The solution is to bypass the copy step for the common case of: for elem in somelistmaker().copysort(): . . . The revised patch is at: www.python.org/sf/825814 The technique is to re-use the existing list whenever the refcount is one. This keeps the mutation invisible. Advantages of a copysort() method: * Avoids creating an unnecessary, stateful variable that remains visible after the sort is needed. In the above example, the definition of the "keys" variable changes from unsorted to sorted. Also, the lifetime of the variable extends past the loop where it was intended to be used. In longer code fragments, this unnecessarily increases code complexity, code length, the number of variables, and increases the risk of using a variable in the wrong state which is a common source of programming errors. * By avoiding control flow (the assignments in the current approach), an inline sort becomes usable anywhere an expression is allowed. This includes important places like function call arguments and list comprehensions: todo = [t for t in tasks.copysort() if due_today(t)] genhistory(date, events.copysort(key=incidenttime)) Spreading these out over multiple lines is an unnecessary distractor from the problem domain, resulting is code that is harder to read, write, visually verify, grok, or debug. Raymond Hettinger P.S. There are probably better names than copysort, but the idea still holds.

Guido> For one, this will surely make lots of people write Guido> for key in D.keys().copysort(): Guido> ... Guido> which makes an unnecessary copy of the keys. It might be viewed as unnecessary if you intend to change D's keys within the loop. Guido> keys = D.keys() Guido> keys.sort() Guido> for key in keys: Guido> ... Current standard practice is also fine. Skip

On Saturday 18 October 2003 05:16 pm, Skip Montanaro wrote:
D.keys() makes a _snapshot_ of the keys of D -- it doesn't matter what you do to D in the loop's body. Admittedly, that's anything but immediately obvious (quite apart from copysorting or whatever) -- I've seen people change perfectly good code of the form: for k in D.keys(): vastly_alter_a_dictionary(D, k) into broken code of the form: for k in D: vastly_alter_a_dictionary(D, k) because of having missed this crucial difference -- snapshot in the first case, but NOT in the second one. And viceversa, I've seen people carefully copy.copy(D.keys()) or the equivalent to make sure they did not suffer from modifying D in the loop's body -- the latter is in a sense even worse, because the bad effects of the former tend to show up pretty fast as weird bugs and broken unit-tests, while the latter is "just" temporarily wasting some memory and copying time. Anyway, copysort with The Trick, either as a method or function, has no performance problems - exactly the same performance as:
Nolo contendere. It DOES feel a bit like boilerplate, that's all. Alex

Alex Martelli strung bits together to say:
Hi, While I'm not an active Python contributor (yet), I've been lurking on python-dev since March. Something was bugging me about the whole l.copysort() ('sortedcopy'?) idea. For whatever reason, the above comment crystalised it - if there's going to be a special 'sortedcopy' to allow minimalist chaining, then what about 'reversedcopy' or 'sortedreversedcopy', or any of the other list methods that may be considered worth chaining? Particularly since the following trick seems to work: ==============
(Tested with Python 2.3rc2, which is what is currently installed on my home machine) Not exactly the easiest to read, but it does do the job of "sorted copy as an expression", as well as letting you chain arbitrary methods of any object. Regards, Nick. -- Nick Coghlan | Brisbane, Australia ICQ#: 68854767 | ncoghlan@email.com Mobile: 0409 573 268 | http://www.talkinboutstuff.net "Let go your prejudices, lest they limit your thoughts and actions."

On Sun, Oct 19, 2003 at 03:18:07AM +1000, Nick Coghlan wrote:
[...]
[...]
(I'm new on this list) Actually, there is way to do this out-of-the-box without the chain() function:
And there is also one for copysort():
But that's probably not more readable. Architect's Sketch... -- Dan Aloni da-x@gmx.net

On Saturday 18 October 2003 09:13 pm, Dan Aloni wrote: ...
This cannot be applied to D.keys() for some directory D.
(lambda x:(x, x.sort())[0])(list(a))
This one can, because the lambda lets you give a temporary name x to the otherwise-unnamed list returned by D.keys(). It can be made a _little_ better, too, I think:
and if you want it reversed after sorting,
(lambda x: x.sort() or x.reverse() or x)(D.keys()) ['o', 'i', 'c', 'a']
But that's probably not more readable.
You have a gift for understatement. Still, probably more readable than the classic list comprehension hack:
[x for x in [D.keys()] for y in [x.sort(), x] if y][0] ['a', 'c', 'i', 'o']
also, the lambda hack doesn't leak names into the surrounding scope, while the list comprehension hack, alas, does. BTW, welcome to python-dev! Alex

On Sat, Oct 18, 2003 at 10:01:52PM +0200, Alex Martelli wrote:
Good, so this way the difference between copied and not copied is minimized:
(lambda x: x.sort() or x)(a)
And:
(lambda x: x.sort() or x)(list(a))
Nice, this lambda hack is a cleaner, more specific, and simple deviation of the chain() function. Perhaps it could be made more understandable like:
And:
sorted(a) ['a', 'c', 'i', 'o']
The only problem is that you assume .sort() always returns a non True value. If some time in the future .sort() would return self, your code would break and then the rightful usage would be:
And:
I didn't see the begining of this discussion, but it looks to me that sort() returning self is much better than adding a .copysort().
BTW, welcome to python-dev!
Thanks! -- Dan Aloni da-x@gmx.net

On Saturday 18 October 2003 10:54 pm, Dan Aloni wrote: ...
No fair -- that's not a single expression any more!-)
Why do you think it would break? It would do a _tiny_ amount of avoidable work, but still return the perfectly correct result. Sure you don't think I'd post an unreadable inline hack that would break in the unlikely case the BDFL ever made a change he's specifically Pronounced again, right?-)
I didn't see the begining of this discussion, but it looks to me that sort() returning self is much better than adding a .copysort().
The BDFL has Pronounced against it: he doesn't LIKE chaining. Alex

On Saturday 18 October 2003 07:18 pm, Nick Coghlan wrote:
Hi Nick!
sort has just (in CVS right now) sprouted an optional reverse=True parameter that makes sort-reverse chaining a non-issue (thanks to the usual indefatigable Raymond, too). But, for the general case: the BDFL has recently Pronounced that he does not LIKE chaining and doesn't want to encourage it in the least. Yes, your trick does allow chaining, but the repeated chain(...) calls are cumbersome enough to not count as an encouragement IMHO;-). A generalized wrapping system that wraps all methods so that any "return None" is transformed into a "return self" WOULD constitute encouragement, and thus a case of Lese BDFLity, which would easily risk the wrath of the PSU (of COURSE there ain't no such thing!)... Alex

Alex Martelli strung bits together to say:
Well, yes, that was sort of the point. For those who _really_ like chaining (I'm not one of them - I agree with Guido that it is less readable and harder to maintain), the 'chain' function provides a way to do it with what's already in the language. Cheers, Nick. -- Nick Coghlan | Brisbane, Australia ICQ#: 68854767 | ncoghlan@email.com Mobile: 0409 573 268 | http://www.talkinboutstuff.net "Let go your prejudices, lest they limit your thoughts and actions."

[Alex Martelli]
[Nick Coghlan]
Remember, list.copysort() isn't about chaining or even "saving a line or two". It is about using an expression instead of a series of statements. That makes it possible to use it wherever expressions are allowed, including function call arguments and list comprehensions. Here are some examples taken from the patch comments: genhistory(date, events.copysort(key=incidenttime)) todo = [t for t in tasks.copysort() if due_today(t)] To break these back into multiple statements is to cloud their intent and take away their expressiveness. Using multiple statements requires introducing auxiliary, state-changing variables that remain visible longer than necessary. State changing variables are a classic source of programming errors. In contrast, the examples above are clean and show their correctness without having to mentally decrypt them. Scanning through the sort examples in the standard library, I see that the multi-line, statement form is sometimes further clouded by having a number of statements in-between. In SimpleHTTPServer.py, for example list = os.listdir(path) . . . (yada, yada) list.sort(key=lambda a: a.lower()) . . . (yada, yada, yada) for name in list: . . . You see other examples using os.environ and such. The forces working against introducing an in-line sort are: * the time to copy the list (which Alex later showed to be irrelevant), * having two list methods with a similar purpose, and * the proposed method names are less than sublime If someone could come-up with a name more elegant than "copysort", I the idea would be much more appetizing. Raymond Hettinger

On Sunday 19 October 2003 07:23 pm, Raymond Hettinger wrote: ...
Good summary (including the parts I snipped).
If someone could come-up with a name more elegant than "copysort", I the idea would be much more appetizing.
I still think that having it in some module is a bit better than having it as a method of lists. The BDFL has already Pronounced that it's too narrow in applicability for a builtin (and he's right -- as usual), and that we won't have "grab-bag" module of shortcuts that don't fit well anywhere else (ditto), and seems very doubtful despite your urgings to reconsider his stance against adding it as a list method (the two-methods-similar- purpose issue seems quite relevant). So, back to what I see as a key issue: a module needs to be "about" something reasonably specific, such as a data type. Built-in data types have no associated module except the builtin one, which is crowded and needs VERY high threshold for any addition. So, if I came up with an otherwise wonderful function that works on sets, arrays, ..., I'd be in clover -- there's an obvious module to house it)... but if the function worked on lists, dicts, files, ..., I'd be hosed. Note that module string STILL exists, and still is the ONLY way to call maketrans, an important function that was deemed inappropriate as a string method; a function just as important, and just as inappropriate as a method, that worked on lists, or dicts, or files, or slices, or ... would be "homeless" and might end up just not entering the standard library. In a way this risks making built-in types "second-class citizens" when compared to types offered by other modules in the standard library! I think we SHOULD have modules corresponding to built-in types, if there are important functions connected with those types but not appropriate as methods to populate them. Perhaps we could use the User*.py modules for the purpose, but making new ones seems better. Rather than being kept together just by naming conventions, as the User*.py are, they might be grouped in a package. Good names are always a problem, but, say, "tools.lists" might be the modules with the auxiliary tools dealing with lists, if "tools" was the package name -- "tools.dicts", "tools.files", etc, if needed -- "tools.sequences" for tools equally well suited to all sequences (not just lists) -- of course, that would retroactively suggest "tools.iters" for your itertools, oh well, pity, we sure can't rename it breaking backwards compatibility:-). If we had module tools.lists (or utils.lists, whatever) then I think copysort (by whatever name) would live well there. copyreverse and copyextend might perhaps also go there and make Barry happy?-) Alternatively - we could take a different tack. copysort is NOT so much a tool that works on an existing list -- as shown in the code I posted, thanks to PySequence_List, it's just as easy to make it work on any sequence (finite iterator or iterable). So what does it do? It BUILDS a new list object (a sorted one) from any sequence. So -- it's a FACTORY FUNCTION of the list type. Just like, say, dict.fromkeys is a factory function of the dict type. Now, factory functions are "by nature" classmethods of their type object, no? So, we should package THIS factory function just like others -- as a classmethod on list, list.somename, just like dict.fromkeys is a classmethod on dict. In this light, we surely don't want "copy" as a part of the name -- a factory method should be thought of as building a new list, not as copying an old one (particularly because it will work on any sequence as its argument, of course). Maybe list.buildsorted if we want to emphasize the "build" part. Or list.newsorted to emphasize that a new object is returned. Or maybe, like in dict.fromkeys, we don't want to emphasize either the building or the newness, but then I wouldn't know what to suggest except the list.sorted that's already drawn catcalls (though it drew them when it was proposed as an instance methods of lists -- maybe as a classmethod it will look better?-) I want the functionality -- any sensible name that might let the functionality into the standard library would be ok by me (so would one putting the functionality in as a builtin or as an instance method of lists, actually, but I _do_ believe those would not be the best places for this functionality, by far). I hope the "tools package" idea and/or the classmethod one find favour...!-) Alex

list.sorted as a list factory looks fine to me. Maybe whoever pointed out the problem with l.sorted() vs. l.sort() for non-native-English speakers can shed some light on how list.sorted(x) fares compared to x.sort()? But the argument that it wastes a copy still stands (even though that's only O(N) vs. O(N log N) for the sort).
I'm still unclear why this so important to have in the library when you can write it yourself in two lines. --Guido van Rossum (home page: http://www.python.org/~guido/)

On Monday, October 20, 2003, at 01:22 PM, Guido van Rossum wrote:
Probably "there should only be one way to do something." It's something that is recreated over and over, mostly the same way but sometimes with slight differences (e.g., copy-and-sort versus sort-in-place). Like dict() growing keyword arguments, a copy/sort method (function, classmethod, whatever) will standardize something that is very commonly reimplemented. Another analogs might be True and False (which before being built into Python may have been spelled true/false, TRUE/FALSE, or just 0/1). These don't add any real features, but they standardize these simplest of idioms. I think I've seen people in this thread say that they've written Big Python Programs, and they didn't have any problem with this -- but this is a feature that's most important for Small Python Programs. Defining a sort() function becomes boilerplate when you write small programs. Or alternatively you create some util module that contains these little functions, which becomes like a site.py only somewhat more explicit. A util module feels like boilerplate as well, because it is a module without any conceptual integrity, shared between projects only for convenience, or not shared as it grows organically. "from util import sort" just feels like cruft-hiding, not real modularity. -- Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org

That's one of the best ways I've seen this formulated. If Alex's proposal to have list.sorted() as a factory function is acceptable to the non-English-speaking crowd, I think we can settle on that. (Hm, an alternative would be to add a "sort=True" keyword argument to list()...) --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum writes:
(Hm, an alternative would be to add a "sort=True" keyword argument to list()...)
My immediate expectation on seeing that would be that the keyword args for l.sort() would also be present. It feels better to isolate that stuff; keeping list.sorted(...) make more sense I think. -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> PythonLabs at Zope Corporation

At 12:24 PM 10/20/03 -0700, Guido van Rossum wrote:
Does this extend by analogy to other requests for short functions that are commonly reimplemented? Not that any spring to mind at the moment; it just seems to me that inline sorting is one of a set of perennially requested such functions or methods, where the current standard answer is "but you can do it yourself in only X lines!".
Wouldn't it need to grow key and cmpfunc, too?

Only if there's some quirk to reimplementing them correctly, and only if the need is truly common. Most recently we did this for sum().
Yes, but list.sorted() would have to support these too. It might become slightly inelegant because we'd probably have to say that sorted defaults to False except it defaults to True if either of cmp, and key is specified. Note that reverse=True would not imply sorting, so that list(range(10), reverse=True) would yield [9, 8, 7, 6, 5, 4, 3, 2, 1, 0] But Raymond has a different proposal in mind for that (he still needs to update PEP 322 though). So maybe list.sorted() is better because it doesn't lend itself to such generalizations (mostly because of the TOOWTDI rule). --Guido van Rossum (home page: http://www.python.org/~guido/)

Let's see what the use cases look like under the various proposals: todo = [t for t in tasks.copysort() if due_today(t)] todo = [t for t in list.sorted(tasks) if due_today(t)] todo = [t for t in list(tasks, sorted=True) if due_today(t)] genhistory(date, events.copysort(key=incidenttime)) genhistory(date, list.sorted(events, key=incidenttime)) genhistory(date, list(events, sorted=True, key=incidenttime)) for f in os.listdir().copysort(): . . . for f in list.sorted(os.listdir()): . . . for f in list(os.listdir(), sorted=True): . . . To my eye, the first form reads much better in every case. It still needs a better name though. [Phillip J. Eby in a separate note]
Wouldn't it need to grow key and cmpfunc, too?
Now, that "key" and "reverse" are available, there is no need for "cmp" in any new methods. [Guido in a separate note]
I'll get to it soon; there won't be any surprises. Raymond Hettinger ################################################################# ################################################################# ################################################################# ##### ##### ##### ################################################################# ################################################################# #################################################################

On Monday 20 October 2003 11:43 pm, Raymond Hettinger wrote:
You're forgetting the cases in which (e.g.) tasks is not necessarily a list, but any finite sequence (iterable or iterator). Then. e.g. the first job becomes: todo = [t for t in list(tasks).copysort() if due_today(t)] todo = [t for t in list.sorted(tasks) if due_today(t)] todo = [t for t in list(tasks, sorted=True) if due_today(t)] and I think you'll agree that the first construct isn't that good then (quite apart from the probably negligible overhead of an unneeded copy -- still, we HAVE determined that said small overhead needs to be paid sometimes, and needing to code list(x).copysort() when x is not a list or you don't KNOW if x is a list adds one copy then).
Sorry, but much as I dislike cmpfunc it's still opportune at times, e.g. I'd rather code: def Aup_Bdown(x, y): return cmp(x.A, y.A) or cmp(y.B, x.B) for a in list.sorted(foo, cmp=Aup_Bdown): ... than for a in list.sorted( list.sorted(foo, key=lambda x:x.B, reverse=True), key=lambda x: x.A): ... or even for a in list(foo).copysort(key=lambda x:x.B, reverse=True ).copysort(key=lambda x: x.A): ... Alex

On Mon, 2003-10-20 at 22:43, Raymond Hettinger wrote:
Well, #3 is (I hope) a non-starter, given the need for the extra sort keyword arguments. And the instance method is less capable - it can't sort a non-list iterable (except via list(xxx).copysort()). So I would definitely prefer #2, especially as I would tend to put: sort = list.sorted at the top of my modules where needed. Then I'd have: todo = [t for t in sort(tasks) if due_today(t)] genhistory(date, sort(events, key=incidenttime)) for f in sort(os.listdir()): . . . which to me looks enough like pseudocode that I'm happy. This might seem like an argument for having sort() as a builtin, but I think it's still better as a list constructor. Adding "sort = list.sorted" to the modules that need it is a small price to pay in boilerplate for the big win of not cluttering the builtin namespace. Mark Russell

Really? That would seem to just obfuscate things for the reader (who would have to scroll back potentially many pages to find the one-line definition of sort). Why be so keen on saving 7 keystrokes? How many calls to list.sorted do you expect to have in your average module? --Guido van Rossum (home page: http://www.python.org/~guido/)

On Mon, 2003-10-20 at 23:49, Guido van Rossum wrote:
I think most readers would probably be able to guess what for key in sort(d.keys()): would do. If not then it's no worse than a user-defined function. It's also a matter of proportion -- the important thing about the code above is that it's walking over a dictionary. In most of my uses, the sort() is just a detail to ensure reproducible behaviour. In a new language I think you could make a case for the default behaviour for dict iteration to be sorted, with a walk-in-unspecified-order method for the cases where the speed really does matter. Back in the real world, how about: for key, value in d.sort(): (i.e. adding a sort() instance method to dict equivalent to: def sort(d, cmp=None, key=None, reverse=False): l = list(d.items()) l.sort(cmp, key, reverse) return l ). At least there's no question of an in-place sort for dicts!
Why be so keen on saving 7 keystrokes?
It's not totally trivial - for me a list comprehension is noticeably less readable when split over more than one line.
How many calls to list.sorted do you expect to have in your average module?
Er, about 0.3 :-) In the project I'm working on, there are 52 sortcopy() calls in 162 modules (about 18K LOC). Not enough to justify a built-in sort(), but enough I think to make list.sorted() worthwhile. Mark Russell

On Tuesday 21 October 2003 12:31 pm, Mark Russell wrote:
Incidentally, for k in list.sorted(d): will be marginally faster, e.g. (using the copysort I posted here, without The Trick -- it should be just about identical to the list.sorted classmethod): import copysort x = dict.fromkeys(map(str,range(99999))) def looponkeys(x, c=copysort.copysort): for k in c(x.keys()): pass def loopondict(x, c=copysort.copysort): for k in c(x): pass [alex@lancelot ext]$ timeit.py -c -s'import t' 't.loopondict(t.x)' 10 loops, best of 3: 2.84e+05 usec per loop [alex@lancelot ext]$ timeit.py -c -s'import t' 't.looponkeys(t.x)' 10 loops, best of 3: 2.67e+05 usec per loop i.e., about 10% better for this size of list and number of runs (quite a few, eyeball x.keys()...:-). Nothing crucial, of course, but still. Moreover, "list.sorted(d)" and "sort(d.keys())" are the same length, and the former is conceptually simpler (one [explicit] method call, vs one method call and one function call). Of course, if each keystroke count, you may combine both "abbreviations" and just use "sort(d)".
for key, value in d.sort():
(i.e. adding a sort() instance method to dict equivalent to:
Why should multiple data types acquire separate .sort methods with subtly different semantics (one works in-place and returns None, one doesn't mutate the object and returns a list, ...) when there's no real added value wrt ONE classmethod of list...? Particularly with cmp, key, and reverse on each, seems cumbersome to me. Truly, is list.sorted(d.iteritems()) [or d.items() if you'd rather save 4 chars than a small slice of time:-)] SO "unobvious"? I just don't get it. Alex

On Tue, 2003-10-21 at 12:00, Alex Martelli wrote:
I agree that the different semantics for lists and dicts are a strike against this. The argument for it is that walking over a dictionary in sorted order is (at least to me) a missing idiom in python. Does this never come up when you're teaching the language? I wouldn't advocate adding this to other types (e.g. Set) because they're much less commonly used than dicts, so I don't think there's a danger of a creeping plague of sort methods. Not a big deal though - list.sorted() is the real win. Mark Russell PS: I'm really not an anal-retentive keystoke counter :-)

On Tuesday 21 October 2003 01:18 pm, Mark Russell wrote:
It's a frequently used idiom (actually more than one) -- it's not "missing".
never come up when you're teaching the language?
Sure, and I have a good time explaining that half the time you want to sort on KEYS and half the time on VALUES. An example I often use is building and displaying a word-frequency index: now it's pretty obvious that you may want to display it just as easily by frequency (most frequent words first) OR alphabetically. The key= construct IS a huge win, btw. I just wish there WAS an easier way to express the TYPICAL keys one wants to use than lambda x: x[N] for some N or lambda x: x.A for some A. getattr and operator.getitem are no use, alas, even when curried, because they take x first:-(. I'd rather not teach lambda (at least surely not early on!) so I'll end up with lots of little def's (whose need had sharply decreased with list comprehensions, as map and filter moved into a corner to gather dust). Ah well.
I wouldn't advocate adding this to other types (e.g. Set) because they're much less commonly used than dicts, so I don't think there's a
Actually, I was thinking of presenting them BEFORE dicts next time I have an opportunity of teaching Python from scratch. The ARE simpler and more fundamental, after all.
danger of a creeping plague of sort methods. Not a big deal though - list.sorted() is the real win.
I concur.
Mark Russell
PS: I'm really not an anal-retentive keystoke counter :-)
OK, sorry for the digs, it just _looked_ that way for a sec;-). Alex

Mark Russell <marktrussell@btopenworld.com>:
Maybe dicts should have a .sortedkeys() method? The specialised method name would help stave off any temptation to add varied sort methods to other types. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+

Did the discussion of a sort() expression get resolved? The last I remember was that the list.sorted() classmethod had won the most support because it accepted the broadest range of inputs. I could live with that though I still prefer the more limited (list-only) copysort() method. Raymond Hettinger

list.sorted() has won, but we are waiting from feedback from the person who didn't like having both sort() and sorted() as methods, to see if his objection still holds when one is a method and the other a factory function. --Guido van Rossum (home page: http://www.python.org/~guido/)

On Wed, Oct 22, 2003, Guido van Rossum wrote:
Actually, I was another person who expressed dislike for "sorted()" causing confusion, but previous calls for feedback were restricted to those who felt comfortable expressing opinions for non-English speakers. I'm still -1 on sorted() compared to copysort(), but because it's a different context, I'm no longer actively opposed (which is why I didn't bother speaking up earlier). I still think that a purely grammatical change in spelling is not appropriate to indicate meaning, particularly when it's still the same part of speech (both verbs). To my mind, sorted() implies a Boolean predicate. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "It is easier to optimize correct code than to correct optimized code." --Bill Harlan

On Wednesday 22 October 2003 08:53 pm, Guido van Rossum wrote:
So, if I've followed correctly the lots of python-dev mail over the last few days, that person (Aahz) is roughly +0 on list.sorted as classmethod and thus we can go ahead. Right? Alex

On Sat, Oct 25, 2003, Alex Martelli wrote:
I'm not the person who objected on non-English speaking grounds, and I'm -0 because I don't like using grammatical tense as the differentiator; as I said, I'd expect sorted() to be a predicate. If we're doing this (and it seems we are), I still prefer copysort() for clarity. But I'm not objecting to sorted(). -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "It is easier to optimize correct code than to correct optimized code." --Bill Harlan

[Alex]
[Aahz]
Predicates start with 'is'. For example, s.lower() converts s to lowercase; s.islower() asks if s is lowercase. I'm -1 on list.copysort() as a constructor/factory. Since whoever didn't like sorted() before hasn't piped up now, I think we should go ahead and implement the list.sorted() constructor. --Guido van Rossum (home page: http://www.python.org/~guido/)

Since whoever didn't like sorted() before hasn't piped up now, I think we should go ahead and implement the list.sorted() constructor.
Okay, I'll modify the patch to be a classmethod called sorted() and will assign to Alex for second review. Raymond

If we're doing this (and it seems we are), I still prefer copysort() for clarity.
"copysort" sounds like the name of some weird sorting algorithm to me. I'd prefer "sortedcopy" (although I suppose that could be read as a predicate, too -- "is x a sorted copy of y?") Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+

Okay, this is the last chance to come-up with a name other than sorted(). Here are some alternatives: inlinesort() # immediately clear how it is different from sort() sortedcopy() # clear that it makes a copy and does a sort newsorted() # appropriate for a class method constructor I especially like the last one and all of them provide a distinction from list.sort(). Raymond Hettinger

Hi, thought you might be interested in the opinions of someone not (yet) working full-day with Python and whose mother tounge is *not* english. "Raymond Hettinger" <python@rcn.com> schrieb
Okay, this is the last chance to come-up with a name other than sorted().
The method is making a copy and sorts that and returns it, right? I think the copy is not fully clear from this name. I'd give it +0
Here are some alternatives:
inlinesort() # immediately clear how it is different from sort()
I'm rather -1 on it. Inline might be confused with inplace, and even when not it's not clear from the name that a copy is made.
sortedcopy() # clear that it makes a copy and does a sort
My favourite (if the behaviour is how I believe it, that is *only* the copy is sorted) It's really obvious what is done. +1
newsorted() # appropriate for a class method constructor
I first read this news-orted, and had to step back. Also "new" is not actually the same than "copy" to me (maybe because of my C++) background. Say -0 hth Werner

On Tuesday 28 October 2003 03:01 pm, Phillip J. Eby wrote:
Please explain how it might work when the argument to list.sortedcopy is *NOT* an instance of type list, but rather a completely general sequence, as a classmethod will of course allow us to have. Maybe I'm missing some recent Python enhancements, but I thought that, if a method is fully usable as an instancemethod, then when called on the type it's an unbound method and will ONLY support being called with an instance of the type as the 1st arg. Hmmm... maybe one COULD make a custom descriptor that does support both usages... and maybe it IS worth making the .sorted (or whatever name) entry a case of exactly such a subtle custom descriptor... Alex

Thanks for the idea, I can use this as a perverted example in my talk at Stanford tomorrow. Here it is: import new def curry(f, x, cls=None): return new.instancemethod(f, x) class MagicDescriptor(object): def __init__(self, classmeth, instmeth): self.classmeth = classmeth self.instmeth = instmeth def __get__(self, obj, cls): if obj is None: return curry(self.classmeth, cls) else: return curry(self.instmeth, obj) class MagicList(list): def _classcopy(cls, lst): obj = cls(lst) obj.sort() return obj def _instcopy(self): obj = self.__class__(self) obj.sort() return obj sorted = MagicDescriptor(_classcopy, _instcopy) class SubClass(MagicList): def __str__(self): return "SubClass(%s)" % str(list(self)) unsorted = (1, 10, 2) print MagicList.sorted(unsorted) print MagicList(unsorted).sorted() print SubClass.sorted(unsorted) print SubClass(unsorted).sorted() --Guido van Rossum (home page: http://www.python.org/~guido/)

Oops, remnant of a dead code branch.
Right. I had bigger plans but decided to can them. :-) --Guido van Rossum (home page: http://www.python.org/~guido/)

[Guido's code]
Notwithstanding the "perverted" implementation, Alex's idea is absolutely wonderful and addresses a core usability issue with classmethods. If only in the C API, I would like to see just such a universalmethod alternative to classmethod. That would allow different behaviors to be assigned depending on how the method is called. Both list.sort() and dict.fromkeys() would benefit from it: class MagicDict(dict): def _class_fromkeys(cls, lst, value=True): "Make a new dict using keys from list and the given value" obj = cls() for elem in lst: obj[elem] = value return obj def _inst_fromkeys(self, lst, value=True): "Update an existing dict using keys from list and the given value" for elem in lst: self[elem] = value return self newfromkeys = MagicDescriptor(_class_fromkeys, _inst_fromkeys) print MagicDict.newfromkeys('abc') print MagicDict(a=1, d=2).newfromkeys('abc') An alternative implementation is to require only one underlying function and to have it differentiate the cases based on obj and cls: class MagicDict(dict): def newfromkeys(obj, cls, lst, value=True): if obj is None: obj = cls() for elem in lst: obj[elem] = value return obj newfromkeys = universalmethod(newfromkeys) Raymond Hettinger

I'm not so sure. I think the main issue is that Python users aren't used to static methods; C++ and Java users should be familiar with them and I don't think they cause much trouble there.
But your _inst_fromkeys mutates self! That completely defeats the purpose (especially since it also returns self) and I am as much against this (approx. -1000 :-) as I am against sort() returning self. To me this pretty much proves that this is a bad idea; such a schizo method will confuse users more that a class method that ignores the instance. And if you made an honest mistake, and meant to ignore the instance, it still proves that this is too confusing to do! :-) --Guido van Rossum (home page: http://www.python.org/~guido/)

[GvR]
But your _inst_fromkeys mutates self!
That issue wasn't intrinsic to the proposal. The implementation should have read: class MagicDict(dict): def newfromkeys(obj, cls, lst, value=True): "Returns a new MagicDict with the keys in lst set to value" if obj is not None: cls = obj.__class__ newobj = cls() for elem in lst: newobj[elem] = value return newobj newfromkeys = universalmethod(newfromkeys) Universal methods give the method a way to handle the two cases separately. This provides both the capability to make an instance from scratch or to copy it off an existing instance. Your example was especially compelling: a = [3,2,1] print a.sorted() print list.sorted(a) Raymond Hettinger

On Thursday 30 October 2003 06:49 am, Raymond Hettinger wrote: ...
Actually, yes, it IS compelling indeed. Funny -- I was originally just brainstorming / musing out loud, never thought about this as a "real thing". But now that it's matured a bit, I do feel sure -- from past experience with what puzzles Python newbies depending on their previous programming knowledge or lack thereof -- that if we had this it would *seriously* reduce the number of puzzlements we have to solve on c.l.py or help@python.org. Which IS strange, in a way, because I do not know of any existing language that has exactly this kind of construct -- a factory callable that you can call on either a type or an instance with good effect. Yet despite it not being previously familiar it DOES "feel natural". Of course, the 3 lines above would also work if sorted was an ordinary instancemethod, but that's just because a is a list instance; if we had some other sequence, say a generator expression, print list.sorted(x*x for x in a) would be just as sweet, and _that_ is the compelling detail IMHO. Trying to think of precedents: Numeric and gmpy have quite a few things like that, except they're (by necessity of the age of gmpy and Numeric) ordinary module functions AND instance methods. E.g.:
as a module function, sqrt is the truncating integer square root, which is also a method of mpz instances (mpz being the integer type in gmpy). mpf (the float type in gmpy) has a sqrt method too, which is nontruncating -- that is also a module function, but, as such, it needs to be called fsqrt (sigh). I sure _would_ like to expose the functionality as mpf.sqrt(x) and mpz.sqrt(x) [would of course be more readable with other typenames than those 'mpf' and 'mpz', but that's another issue, strictly a design mistake of mine]. Alex

If you feel about it that way, I recommend that you let it mature a bit more. If you really like this so much, please realize that you can do this for *any* instance method. The identity C.foo(C()) == C().foo() holds for all "regular" methods. (Since 2.2 it also holds for extension types.) If we were to do this, we'd be back at square two, which we rejected: list instances having both a sort() and a sorted() method (square one being no sorted() method at all :-). --Guido van Rossum (home page: http://www.python.org/~guido/)

On Thursday 30 October 2003 04:43 pm, Guido van Rossum wrote:
Yes, having a be an instance of list in the above doesn't show 'sorted' as being any different than a perfectly regular instance method -- it WAS in this sense a bad example (I thought I'd mentioned that later on in the very same post?). The identify I want is rather like: C.foo(x) == C(x).foo() for an x that's not necessarily an instance of C, just something that has a natural way to become one. When C is list, any iterable x, for example. In other words, being able to call C.foo(x) _without_ the typechecking constraint that x is an instance of C, as one would have for a normal C.foo unbound-method.
Yes, the names are an issue again -- but having e.g. x=L1.sorted(L2) completely ignore the value of L1 feels a bit strange to me too (as does x=D1.fromkeys(L3) now that I've started thinking about it -- I've never seen any Python user, newbie or otherwise, have actual problems with this, but somebody on c.l.py claimed that's just because "nobody" knows about fromkeys -- so, I dunno...). Darn -- it WOULD be better in some cases if one could ONLY call a method on the class, NOT on an instance when the call would in any case ignore the instance. Calling dict.fromkeys(L3) is wonderful, the problem is that you can also call it on a dict instance, and THAT gets confusing. Similarly, calling list.sorted(iterable) is wonderful, but calling it on a list instance that gets ignored, L1.sorted(iterable), could perhaps be confusing. Yeah, the C++(staticmethod)/Smalltalk(clasmethod) idea of "call it on the instance" sounds cool in the abstract, but when it comes down to cases I'm not so sure any more -- what might be use cases where it's preferable, rather than confusing, to be able to call aninst.somestaticmethod(x,y) "just as if" it was a normal method? Would it be such an imposition to "have to know" that a method is static and call type(aninst).somestaticmethod(x,y) instead, say...? Oh well, I guess it's too late to change the semantics of the existing descriptors, even if one agrees with my newfound doubts. But the funniest thing is that I suspect the _new_ descriptor type would be the _least_ confusing of them, because calling such a method on class or instance would have semantics much closer to ordinary methods, just slightly less typeconstrained. Oh well! Alex

Let's focus on making this an issue that one learns without much pain. Given that the most common mistake would be to write a.sorted(), and that's a TypeError because of the missing argument, perhaps we could make the error message clearer? Perhaps we could use a variant of classmethod whose __get__ would raise the error, rather than waiting until the call -- it could do the equivalent of the following: class PickyClassmethod(classmethod): def __get__(self, obj, cls): if obj is not None: raise TypeError, "class method should be called on class only!" else: return classmethod.__get__(self, None, cls) I don't want to make this behavior the default behavior, because I can see use cases for calling a class method on an instance too, knowing that it is a class method; otherwise one would have to write the ugly x.__class__.foobar(). --Guido van Rossum (home page: http://www.python.org/~guido/)

At 09:16 AM 10/30/03 -0800, Guido van Rossum wrote:
If there were a 'classonlymethod()' built-in, I'd probably use it, as I use classmethods a fair bit (mostly for specialized constructors), but I don't recall ever desiring to call one via an instance. Do you have an example of the use cases you see? Hm. What if your PickyClassmethod were a built-in called 'constructor' or 'factorymethod'? Probably too confining a name, if there are other uses for class-only methods, I suppose.

Not exactly, but I notice that e.g. UserList uses self.__class__ a lot; I think that's the kind of area where it might show up.
I'm not convinced that we have a problem (beyond Alex lying awake at night, that it :-). --Guido van Rossum (home page: http://www.python.org/~guido/)

At 10:00 AM 10/30/03 -0800, Guido van Rossum wrote:
I thought you were proposing to use it for list.sorted, in order to provide a better error message when used with an instance. If such a descriptor were implemented, I was suggesting that it would be useful as a form of documentation (i.e. that a method isn't intended to be called on instances of the class), and thus it would be nice for it to be exposed for folks like me who'd take advantage of it. (Especially if PEP 318 is being implemented.)

I mostly just proposed it to placate Alex; I think he's overly worried in this case. PEP 318 seems a ways off. --Guido van Rossum (home page: http://www.python.org/~guido/)

Both. This is the kind of syntactic change that require much deep thought before committing. Unfortunately I don't have time for that right now, so please don't ask. --Guido van Rossum (home page: http://www.python.org/~guido/)

On Thu, 2003-10-30 at 15:18, Guido van Rossum wrote:
I won't, but I do hope this is something that we can settle for Python 2.4. I've been using the functionality in Python 2.3 for a while now and it is wonderful, but I the tedium and clumsiness of the current syntax really puts a damper on its use. -Barry

On Thursday 30 October 2003 09:55 pm, Barry Warsaw wrote:
Not on mine (my use), but, yes, I _have_ seen some Pythonistas be rather perplexed by it. Giving it a neat, cool look will be good. BTW, when we do come around to PEP 318, I would suggest the 'as' clause on a class statement as the best way to specify a metaclass. 'class Newstyle as type:' for example is IMHO neater -- and thus more encouraging to the generalized use of newstyle classes -- than the "inheriting from object" idea or the setting of __metaclass__; it reads well AND makes what one's doing more obvious when a custom MC is involved, because it's so "up-front". Besides, it's STILL syntax for a call to the thingy on the RHS of 'as', just like, say, def foop() as staticmethod: is, even though the details of how that call is performed are different for metaclasses (called with classname/bases/classdict) and function decorators (called with the function object). BTW, the PEP isn't very clear about this, but, I would hope the 'as' clause applies uniformly to ANY def or class statement, right? No reason to specialcase, that I can see -- "def ... as" may well be used mostly inside classbodies, because we do have decorators ready for that, but the 'synchronized(lock)' decorator used in the PEP's examples would seem just as applicable to freestanding functions as to methods. Alex

On Thursday 30 October 2003 07:19 pm, Guido van Rossum wrote:
I'm not convinced that we have a problem (beyond Alex lying awake at night, that it :-).
As it happens I just had a very unusual ten-hours-of-sleep night, so I don't think you need to worry:-).
OK, then it does appear to me that new descriptors may wait for PEP 318 to mature, and list.sorted be left as is for now. Hopefully both can be taken into consideration before 2.4 is finalized, since that time is also "a ways off", no doubt. Alex

Then why don't you use a custom descriptor which raises an exception when an instance is passed in? Like: def __get__(self, obj, cls): if obj is None: return new.instancemethod(self.classmeth, cls) else: raise TypeError, \ "Calling %s on instance %s ignores instance" % \ (self.classmeth, obj) -- Christian Tanzer http://www.c-tanzer.at/

Well, I'd like to withdraw it. Having all three of a.sort(), a.sorted() and list.sorted(a) available brings back all the confusion of a.sort() vs. a.sorted(). What's in CVS is just fine. --Guido van Rossum (home page: http://www.python.org/~guido/)

On Thursday 30 October 2003 06:31 am, Guido van Rossum wrote:
"Yes, but". The ability to call something on either the class OR the instance IS a Python hallmark... with the constraint that when you call it on the class you need to provide an instance as the first arg (assuming the "something" is a normal instance method, which is surely the most frequent case). You could see universalmethods as being special only in that they WEAKEN this constraint -- they let the 1st arg be EITHER an instance OR something from which a new instance can be naturally constructed. Use cases: in gmpy: if I had universal methods, current instancemethods mpf.sqrt and mpz.sqrt (multiprecision floatingpoint and integer/truncating square roots respectively) could also be called quite naturally as mpf.sqrt(33) and mpz.sqrt(33) respectively. Right now you have to use, instead, mpf(33).sqrt() or mpz(33).sqrt(), which is QUITE a bit more costly because the instance whose sqrt you're taking gets built then immediately discarded (and building mpf -- and to a lesser extent mpz -- instances is a bit costly); OR you can call module functions gmpy.sqrt(33), truncating sqrt, or gmpy.fsqrt(33), nontruncating (returning a multiprecision float). Just one example -- gmpy's chock full of things like this, which universalmethod would let me organize a bit better. in Numeric: lots of module-functions take an arbitrary iterable, build an array instance from it if needed, and operate on an array instance to return a result. This sort-of-works basically because Numeric has "one main type" and thus the issue of "which type are we talking about" doesn't arise (gmpy has 3 types, although mpz takes the lion's share, making things much iffier). But still, Numeric newbies (if they come from OO languages rather than Fortran) DO try calling e.g. x.transpose() for some array x rather than the correct Numeric.transpose(x) -- and in fact array.transpose, called on the class, would ALSO be a perfeclty natural approach. universalmethod would allow array instances to expose such useful functionality as instance methods AND also allow applying direct operations -- without costly construction of intermediate instances to be thrown away at once -- via "array.transpose" and the like. It's not really about new revolutionary functionality: it's just a neater way to "site" existing functionality. This isn't surprising if you look at universalmethod as just a relaxation of the normal constraint "when you call someclass.somemethod(x, ... -- x must be an instance of someclass" into "x must be an instance of someclass OR -- if the somemethod supports it -- something from which such an instance could be constructed in one obvious way". Then, basically, the call is semantically equivalent to someclass(x).somemethod(... BUT the implementation has a chance to AVOID costly construction of an instance for the sole purpose of calling somemethod on it and then throwing away the intermediate instance at once. No revolution, but, I think, a nice little addition to our armoury. Alex

While we're voting, I still like list.sorted() best, so please keep that one in the list of possibilities. --Guido van Rossum (home page: http://www.python.org/~guido/)

On Tuesday 28 October 2003 04:18 pm, Guido van Rossum wrote:
I also like list.sorted() -- but list.newsorted() is IMHO even a LITTLE bit better, making it even clearer that it's giving a NEW list. Just a _little_ preference, mind you. "sortedcopy" appears to me a BIT less clear (what "copy", if the arg isn't a list...?), "inlinesort" worst. BTW, I think I should point out one POSSIBLE problem with classmethods -- since unfortunately they CAN be called on an instance, and will ignore that instance, this may confuse an unsuspecting user. I was arguing on c.l.py that this _wasn't_ confusing because I never saw anybody made that mistake with dict.fromkeys ... and the response was "what's that?"... i.e., people aren't making mistakes with it because they have no knowledge of it. list.newsorted or however it's going to be called is probably going to be more popular than existing dict.fromkeys, so the issue may be more likely to arise there. Although I think the issue can safely be ignored, I also think I should point it out anyway, even just to get concurrence on this -- it IS possible that the classmethod idea is winning "by accident" just because nobody had thought of the issue, after all, and that would be bad (and I say so even if I was the original proposer of the list.sorted classmethod and still think it should be adopted -- don't want it to get in "on the sly" because a possible problem was overlooked, as opposed to, considered and decided to be not an issue). OK, so here's the problem, exemplified with dict.fromkeys: d = {1:2, 3:4, 5:6} dd = d.fromkeys([3, 5]) it's not immediately obvious that the value of d matters not a whit -- that this is NOT going to return a subset of d "taken from the keys" 3 and 5, i.e. {3:4, 5:6}, but, rather, {3:None, 5:None} -- and the functionality a naive user might attribute to that call d.fromkeys([3, 5]) should in fact be coded quite differently, e.g.: dd = dict([ (k,v) for k, v in d.iteritems() if k in [3,5] ]) or perhaps, if most keys are going to be copied: dd = d.copy() for k in d: if k not in [3, 5]: del dd[k] The situation with list.sorted might be somewhat similar, although in fact I think that it's harder to construct a case of fully sympathizable-with confusion like the above. Still: L = range(7) LL = L.sorted() this raises an exception (presumably about L.sorted needing "at least 1 arguments, got 0" -- that's what d.fromkeys() would do today), so the issue ain't as bad -- it will just take the user a while to understand WHY, but at least there should be running program with strange results, which makes for harder debugging. Or: L = range(7) LL = L.sorted(('fee', 'fie', 'foo')) I'l not sure what the coder might expect here, but again it seems possible that he expected the value of L to matter in some way to the resulting value of LL. Perhaps this points to an issue with classmethods in general, due in part to the fact that they're still rather little used in Python -- callers of instance.method() may mostly expect that the result has something to do with the instance's value, rather than being the same as type(instance).method() -- little we can do about it at this point except instruction, I think. Alex

Alex Martelli <aleaxit@yahoo.com> wrote:
IMO, sorted is the clearest, all other proposals carry excess baggage making them less clear.
Or put the method into the metaclass. I'm using both classmethods and methods defined by metaclasses and didn't get any complaints about classmethods yet. -- Christian Tanzer http://www.c-tanzer.at/

"Guido van Rossum" <guido@python.org> wrote in message news:200310281518.h9SFIj129025@12-236-54-216.client.attbi.com...
After thinking about it some more, I also prefer .sorted to suggested alternatives. I read it as follows: list(iterable) means 'make a list from iterable (preserving item order)' list.sorted(iterable) means 'make a sorted list from iterable' While I generally like verbs for method names, the adjective form works here as modifing the noun/verb 'list' and the invoked construction process. 'Inline' strikes me as confusing. 'Copy' and 'new' strike me as redundant noise since, in the new 2.2+ regime, 'list' as a typal verb *means* 'make a new list'. Terry J. Reedy Terry J. Reedy

On Mon, 2003-10-20 at 19:22, Guido van Rossum wrote:
But the argument that it wastes a copy still stands (even though that's only O(N) vs. O(N log N) for the sort).
That would be irrelevant in most of the cases where I would use it - typically sorting short lists or dicts where the overhead is unmeasurable.
I'm still unclear why this so important to have in the library when you can write it yourself in two lines.
For little standalone scripts it gets a bit tedious to write this again and again. It doesn't take much code to write dict.fromkeys() manually, but I'm glad that it's there. I'd say list.sorted (or whatever it gets called) has at least as much claim to exist. Mark Russell

Raymond Hettinger strung bits together to say:
'chain' may be a bad name then, since all that function really does is take an arbitrary bound method, execute it and then return the object that the method was bound to. If we used a name like 'method_as_expr' (instead of 'chain'), then the above examples would be: genhistory(date, method_as_expr(list(events).sort, key=incidenttime)) todo = [t for t in method_as_expr(list(tasks).sort) if due_today(t)] Granted, it's not quite as clear (some might say it's positively arcane!), but it also isn't using anything that's not already in the language/standard library.
Would something like 'sortedcopy' be an improvement? Although Alex's suggestion of a class method like dict.fromkeys() also sounded good - naming it is still an issue, though. I'm not entirely opposed to the idea (the 'method_as_expr' approach feels like something of a hack, even to me) - but the object method just doesn't seem to fit cleanly into the design of the basic types. Cheers, Nick. -- Nick Coghlan | Brisbane, Australia ICQ#: 68854767 | ncoghlan@email.com Mobile: 0409 573 268 | http://www.talkinboutstuff.net "Let go your prejudices, lest they limit your thoughts and actions."
participants (23)
-
Aahz
-
Alex Martelli
-
Barry Warsaw
-
Batista, Facundo
-
Brett C.
-
Dan Aloni
-
Fred L. Drake, Jr.
-
Greg Ewing
-
greg@cosc.canterbury.ac.nz
-
Guido van Rossum
-
Ian Bicking
-
Kevin Jacobs
-
Mark Russell
-
Mark Russell
-
Michel Pelletier
-
Neil Schemenauer
-
Nick Coghlan
-
Phillip J. Eby
-
Raymond Hettinger
-
Skip Montanaro
-
tanzer@swing.co.at
-
Terry Reedy
-
Werner Schiendl