*Guido van Rossum * guido@python.org <mailto:guido%40python.org> //
Tuples are for heterogeneous data, list are for homogeneous data.
Only if you include *both* null cases: - tuple of type( i ) == type( i+1 ) - list of PyObject Homo-/heterogeneity is orthogonal to the primary benefits of lists (mutability) and of tuples (fixed order/length). Else why can you do list( (1, "two", 3.0) ) and tuple( [x, y, z] ) ?
Tuples are *not* read-only lists.
It just happens that "tuple( sequence )" is the most easy & obvious (and thus right?) way to spell "immutable sequence". Stop reading whenever you're convinced. ;-) (not about mutability, but about homo/heterogeneity) There are three (mostly) independent characteristics of tuples (in most to least important order, by frequency of use, IMO): - fixed order/fixed length - used in function argument/return tuples and all uses as a "struct" - heterogeneity allowed but not required - used in many function argument tuples and many "struct" tuples - immutability - implies fixed-order and fixed-length, and used occasionally for specific needs The important characteristics of lists are also independent of each other (again, IMO on the order): - mutability of length & content - used for dynamically building collections - heterogeneity allowed but not required - used occasionally for specific needs It turns out that fixed-length sequences are often useful for heterogeneous data, and that most sequences that require mutability are homogeneous. Examples from the standard library (found by grep '= (' and grep '= \[' ): # homogeneous tuple - homogeneity, fixed order, and fixed length are all required # CVS says Guido wrote/imported this. ;-) whrandom.py: self._seed = (x or 1, y or 1, z or 1) # homogeneous tuple - homogeneity is required - all entries must be 'types' # suitable for passing to 'isinstance( A, typesTuple )', which (needlessly?) requires a tuple to avoid # possibly recursive general sequences types.py: StringTypes = (StringType, UnicodeType) # heterogeneous list of values of all basic types (we need to be able to copy all types of values) # this could be a tuple, but neither immutability, nor fixed length, nor fixed order are needed, so it makes more sense as a list # CVS blames Guido here, too, in version 1.1. ;-) copy.py: l = [None, 1, 2L, 3.14, 'xyzzy', (1, 2L), [3.14, 'abc'], {'abc': 'ABC'}, (), [], {}] Other homogeneous tuples (may benefit from mutability, but require fixed-length/order): - 3D coordinates - RGB color - binary tree node (child, next) Other heterogeneous lists (homogeneous lists of base-class instances blah-blah-blah): - files AND directories to traverse (strings? "File" objects?) - emails AND faxes AND voicemails AND tasks in your Inbox (items?) - mail AND newsgroup accounts (accounts?) - return values OR exceptions from a list of test cases and test suites (PyObjects? introduce an artificial base class?) Must-be-stubborn-if-you-got-this-far-ly y'rs ;-) kb
On Thursday 13 March 2003 09:09 pm, Kevin J. Butler wrote: ...
The important characteristics of lists are also independent of each other (again, IMO on the order):
- mutability of length & content - used for dynamically building collections - heterogeneity allowed but not required - used occasionally for specific needs
I think some methods must also go on this list of important characteristics -- the sort method, in particular. If you need to sort stuff (including heterogenous stuff, EXCEPT if the latter includes at least one complex AND at least one other number of any kind) you put it into a list and sort the list -- that's the Python way of sorting, and sorting is an often-needed thing. Sorting plays with mutability by working in-place, but for many uses it would be just as good if sorting returned a sorted copy instead -- the key thing here is the sorting, not the mutability. Alex
Alex Martelli wrote: ...
Sorting plays with mutability by working in-place, but for many uses it would be just as good if sorting returned a sorted copy instead -- the key thing here is the sorting, not the mutability.
And the key assumption for sorting things is that the things are sortable, which means there exists and order on the basic set. Which again suggests that list elements usually have something in common. ciao - chris p.s.: I'm not using a shopping tuple, since I sort my list be the stores I have to visit. :-) -- Christian Tismer :^) <mailto:tismer@tismer.com> Mission Impossible 5oftware : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9a : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 89 09 53 34 home +49 30 802 86 56 pager +49 173 24 18 776 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/
Christian Tismer wrote:
And the key assumption for sorting things is that the things are sortable, which means there exists and order on the basic set. Which again suggests that list elements usually have something in common.
To me it suggests that some lists are sortable and others are not... There's one aspect about this discussion that I haven't seen mentioned yet: syntax. I think the suggested usages of lists vs. tuples has more to do with list vs. tuple _syntax_, and less with mutability. From this perspective it is natural that tuples support a different set of methods than lists. However, mutable vs. immutable has it's uses also, and from _that_ perspective it is far less understandable that tuples lack certain methods. FWIW, I quite like the way how the core classes in Cocoa/NextStep are designed. For each container-ish object there's a mutable an immutable variant, where the mutable variant is usually a subclass of the immutable one. Examples: NSString -> NSMutableString NSData -> NSMutableData NSArray -> NSMutableArray NSDictionary -> NSMutableDictionary (But then again, Objective-C doesn't have syntax support for lists _or_ tuples...) Just
FWIW, I quite like the way how the core classes in Cocoa/NextStep are designed. For each container-ish object there's a mutable an immutable variant, where the mutable variant is usually a subclass of the immutable one. Examples: NSString -> NSMutableString NSData -> NSMutableData NSArray -> NSMutableArray NSDictionary -> NSMutableDictionary
This has its downside too though. A function designed to take an immutable class instance cannot rely on the class instance to remain unchanged, because the caller could pass it an instance of the corresponding mutable subclass! (For example, the function might use the argument as a dict key.) In some sense this inheritance pattern breaks the "Liskov substibutability" principle: if B is a base class of C, whenever a B instance is expected, a C instance may be used. --Guido van Rossum (home page: http://www.python.org/~guido/)
Guido van Rossum wrote:
FWIW, I quite like the way how the core classes in Cocoa/NextStep are designed. For each container-ish object there's a mutable an immutable variant, where the mutable variant is usually a subclass of the immutable one. Examples: NSString -> NSMutableString NSData -> NSMutableData NSArray -> NSMutableArray NSDictionary -> NSMutableDictionary
This has its downside too though. A function designed to take an immutable class instance cannot rely on the class instance to remain unchanged, because the caller could pass it an instance of the corresponding mutable subclass! (For example, the function might use the argument as a dict key.) In some sense this inheritance pattern breaks the "Liskov substibutability" principle: if B is a base class of C, whenever a B instance is expected, a C instance may be used.
I'm not sure how much relevance this principle has in a language in which the inheritance tree has little meaning, but since I _am_ sure you read more books about this than I did, I'll take your word for it ;-) It's not so much the inheritance hierarchy that I like about the Cocoa core classes, but the fact that mutability is a prominent part of the design. I think Python would be a better language if it had a mutable string type as well as a mutable byte-oriented data type. An immutable dict would be handy at times. An immutable list type would be great, too. Wait, we already have that. Sure, tuples are often used for struct-like things and lists for that other stuff <wink>, but I don't think it's right to say you _must_ use them like that, and that seeing/using tuples as immutable lists is _wrong_. Just
Sure, tuples are often used for struct-like things and lists for that other stuff <wink>, but I don't think it's right to say you _must_ use them like that, and that seeing/using tuples as immutable lists is _wrong_.
it's not wrong, but I find that many people use tuples in situations where they should really use lists, and the immutability is irrelevant. Using tuples seems to be a reflex for some people because creating a tuple saves a microsecond or so. That sounds like the wrong thing to let inform your reflexes... --Guido van Rossum (home page: http://www.python.org/~guido/)
On Friday 14 March 2003 04:42 pm, Christian Tismer wrote:
Alex Martelli wrote: ...
Sorting plays with mutability by working in-place, but for many uses it would be just as good if sorting returned a sorted copy instead -- the key thing here is the sorting, not the mutability.
And the key assumption for sorting things is that the things are sortable, which means there exists and order on the basic set. Which again suggests that list elements usually have something in common.
If a list contains ONE complex number and no other number, then the list can be sorted. If the list contains elements that having something in common, by both being complex numbers, then it cannot be sorted. So, lists whose elements have LESS in common (by being of widely different types) are more likely to be sortable than lists some of whose elements have in common the fact of being numbers (if one or more of those numbers are complex). Although not likely to give practical problems (after all I suspect most Pythonistas never use complex numbers at all), this anomaly (introduced in 1.6, I think) makes conceptualization less uniform and thus somewhat harder to teach. Alex
Sorting plays with mutability by working in-place, but for many uses it would be just as good if sorting returned a sorted copy instead -- the key thing here is the sorting, not the mutability.
And the key assumption for sorting things is that the things are sortable, which means there exists and order on the basic set. Which again suggests that list elements usually have something in common.
If a list contains ONE complex number and no other number, then the list can be sorted.
But the order isn't meaningful.
If the list contains elements that having something in common, by both being complex numbers, then it cannot be sorted.
So, lists whose elements have LESS in common (by being of widely different types) are more likely to be sortable than lists some of whose elements have in common the fact of being numbers (if one or more of those numbers are complex).
Although not likely to give practical problems (after all I suspect most Pythonistas never use complex numbers at all), this anomaly (introduced in 1.6, I think) makes conceptualization less uniform and thus somewhat harder to teach.
If I had to do it over again, I'd only implement == and != for objects of vastly differing types, and limit <, <=, >, >= to objects that are meaningfully comparable. I'd like to to this in Python 3.0, but that probably means we'd have to start deprecating default comparisons except (in)equality in Python 2.4. --Guido van Rossum (home page: http://www.python.org/~guido/)
"Guido van Rossum" <guido@python.org> wrote in message news:200303151236.h2FCaJP06038@pcp02138704pcs.reston01.va.comcast.net. ..
But the order isn't meaningful. .... If I had to do it over again, I'd only implement == and != for objects of vastly differing types, and limit <, <=, >, >= to objects that are meaningfully comparable.
For user-defined types/classes, I presume that this would still mean deferring to the appropriate magic method (__cmp__ or __ge__?) to define 'meaningful'.
I'd like to to this in Python 3.0, but that probably means we'd have to start deprecating default comparisons except (in)equality in Python 2.4.
+1, I think. Based on reading cl.py, the validity of nonsense comparisons is one of the more surprising 'features' of Python for beginners -- who reasonably expect a TypeError or ValueError. Once they get past that, they are then surprised by the unstability across versions. Given that universal sorting of hetero-lists is now broken, I think it would be better to do away with it cleanly. It is seldom needed and would still be available with a user-defined sorting function (which requires some thought as to what is really wanted). A Python version of the present algorithm could be included (in Tools/xx perhaps) for anyone who actually needs it. Terry J. Reedy
[Terry Reedy]
For user-defined types/classes, I presume that this would still mean deferring to the appropriate magic method (__cmp__ or __ge__?) to define 'meaningful'.
Yes. And I'm still hoping to remove __cmp__; there should be only one way to overload comparisons.
I'd like to to this in Python 3.0, but that probably means we'd have to start deprecating default comparisons except (in)equality in Python 2.4.
+1, I think.
Based on reading cl.py, the validity of nonsense comparisons is one of the more surprising 'features' of Python for beginners -- who reasonably expect a TypeError or ValueError. Once they get past that, they are then surprised by the unstability across versions. Given that universal sorting of hetero-lists is now broken, I think it would be better to do away with it cleanly. It is seldom needed and would still be available with a user-defined sorting function (which requires some thought as to what is really wanted).
Exactly.
A Python version of the present algorithm could be included (in Tools/xx perhaps) for anyone who actually needs it.
I doubt there will be many takers. Let people make up their own version, so they know its behavior. --Guido van Rossum (home page: http://www.python.org/~guido/)
Guido> Yes. And I'm still hoping to remove __cmp__; there should be Guido> only one way to overload comparisons. Moreover, for some data structures, the __cmp__ approach can be expensive. For example, if you're comparing sequences of any kind, and you know that the comparison is for == or !=, you have your answer immediately if the sequences differ in length. If you don't know what's being tested, as you wouldn't inside __cmp__, you may spend a lot more time to obtain a result that will be thrown away. -- Andrew Koenig, ark@research.att.com, http://www.research.att.com/info/ark
Guido> Yes. And I'm still hoping to remove __cmp__; there should be Guido> only one way to overload comparisons.
Moreover, for some data structures, the __cmp__ approach can be expensive. For example, if you're comparing sequences of any kind, and you know that the comparison is for == or !=, you have your answer immediately if the sequences differ in length. If you don't know what's being tested, as you wouldn't inside __cmp__, you may spend a lot more time to obtain a result that will be thrown away.
Yes. OTOH, as long as cmp() is in the language, these same situations are more efficiently done by a __cmp__ implementation than by calling __lt__ and then __eq__ or similar (it's hard to decide which order is best). So cmp() should be removed at the same time as __cmp__. And then we should also change list.sort(), as Tim points out. Maybe we can start introducing this earlier by using keyword arguments: list.sort(lt=function) sorts using a < implementation list.sort(cmp=function) sorts using a __cmp__ implementation --Guido van Rossum (home page: http://www.python.org/~guido/)
Guido> Yes. And I'm still hoping to remove __cmp__; there should be Guido> only one way to overload comparisons.
[Andrew]
Moreover, for some data structures, the __cmp__ approach can be expensive. For example, if you're comparing sequences of any kind, and you know that the comparison is for == or !=, you have your answer immediately if the sequences differ in length. If you don't know what's being tested, as you wouldn't inside __cmp__, you may spend a lot more time to obtain a result that will be thrown away.
[Guido]
Yes. OTOH, as long as cmp() is in the language, these same situations are more efficiently done by a __cmp__ implementation than by calling __lt__ and then __eq__ or similar (it's hard to decide which order is best). So cmp() should be removed at the same time as __cmp__.
I realized the first sentence wasn't very clear. I meant that implementing cmp() is inefficient without __cmp__ for some types (especially container types). Example: cmp(range(1000)+[1], range(1000)+[0]) If the list type implements __cmp__, each of the pairs of items is compared once. OTOH, if the list type only implemented __lt__, __eq__ and __gt__, cmp() presumably would have to try one of those first, and then another one. If it picked __lt__ followed by __eq__, it would get two False results in a row, meaning it could return 1 (cmp() doesn't really expect incomparable results :-), but at the cost of comparing each pair of items twice. If cmp() picked another set of two operators to try, I'd simply adjust the example. --Guido van Rossum (home page: http://www.python.org/~guido/)
On Sun, Mar 16, 2003, Guido van Rossum wrote:
I realized the first sentence wasn't very clear. I meant that implementing cmp() is inefficient without __cmp__ for some types (especially container types). Example:
cmp(range(1000)+[1], range(1000)+[0])
If the list type implements __cmp__, each of the pairs of items is compared once. OTOH, if the list type only implemented __lt__, __eq__ and __gt__, cmp() presumably would have to try one of those first, and then another one. If it picked __lt__ followed by __eq__, it would get two False results in a row, meaning it could return 1 (cmp() doesn't really expect incomparable results :-), but at the cost of comparing each pair of items twice. If cmp() picked another set of two operators to try, I'd simply adjust the example.
That's something I've been thinking about. I use cmp() for that purpose in the BCD module, because I do need the 3-way result (and it appears that Eric kept that). OTOH, it's certainly easy enough to define a cmp() function, and not having the builtin wouldn't kill performance. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Register for PyCon now! http://www.python.org/pycon/reg.html
Guido> I realized the first sentence wasn't very clear. I meant that Guido> implementing cmp() is inefficient without __cmp__ for some types Guido> (especially container types). Example: Guido> cmp(range(1000)+[1], range(1000)+[0]) Guido> If the list type implements __cmp__, each of the pairs of items Guido> is compared once. OTOH, if the list type only implemented Guido> __lt__, __eq__ and __gt__, cmp() presumably would have to try Guido> one of those first, and then another one. If it picked __lt__ Guido> followed by __eq__, it would get two False results in a row, Guido> meaning it could return 1 (cmp() doesn't really expect Guido> incomparable results :-), but at the cost of comparing each Guido> pair of items twice. If cmp() picked another set of two Guido> operators to try, I'd simply adjust the example. Yes. If you want to present a 3-way comparison to users, an underlying 3-way comparison is the fastest way to do it. The trouble is that a 3-way comparison is definitely not the fastest way to present a 2-way comparison to users. So if you want users to see separate 2-way and 3-way comparisons, I think the fastest way to implement them is not to try to force commonality where none exists.
Guido> I realized the first sentence wasn't very clear. I meant that Guido> implementing cmp() is inefficient without __cmp__ for some types Guido> (especially container types). Example:
Guido> cmp(range(1000)+[1], range(1000)+[0])
Guido> If the list type implements __cmp__, each of the pairs of items Guido> is compared once. OTOH, if the list type only implemented Guido> __lt__, __eq__ and __gt__, cmp() presumably would have to try Guido> one of those first, and then another one. If it picked __lt__ Guido> followed by __eq__, it would get two False results in a row, Guido> meaning it could return 1 (cmp() doesn't really expect Guido> incomparable results :-), but at the cost of comparing each Guido> pair of items twice. If cmp() picked another set of two Guido> operators to try, I'd simply adjust the example.
[Andrew Koenig]
Yes. If you want to present a 3-way comparison to users, an underlying 3-way comparison is the fastest way to do it. The trouble is that a 3-way comparison is definitely not the fastest way to present a 2-way comparison to users.
So if you want users to see separate 2-way and 3-way comparisons, I think the fastest way to implement them is not to try to force commonality where none exists.
This seems an argument for keeping both __cmp__ and the six __lt__ etc. Yet TOOWTDI makes me want to get rid of __cmp__. I wonder, what's the need for cmp()? My hunch is that the main reason for cmp() is that it's specified in various APIs -- e.g. list.sort(), or FP hardware. But don't those APIs usually specify cmp() because their designers (mistakenly) believed the three different outcomes were easy to compute together and it would simplify the API? --Guido van Rossum (home page: http://www.python.org/~guido/)
[Andrew Koenig]
Yes. If you want to present a 3-way comparison to users, an underlying 3-way comparison is the fastest way to do it. The trouble is that a 3-way comparison is definitely not the fastest way to present a 2-way comparison to users.
So if you want users to see separate 2-way and 3-way comparisons, I think the fastest way to implement them is not to try to force commonality where none exists.
This seems an argument for keeping both __cmp__ and the six __lt__ etc. Yet TOOWTDI makes me want to get rid of __cmp__.
Recent experience with sets.py shows that __cmp__ has a high PITA factor when combined rich comparisons. There was no good way to produce all of the desired behaviors: * <, <=, >, >= having subset interpretations * __cmp__ being marked as not implemented * cmp(a,b) not by-passing __cmp__ when __lt__ and __eq__ were defined. The source of the complications is that comparing Set('a') and Set('b') returns False for *all* of <, <=, ==, >=, >. Internally, three-way compares relied on the falsehood of some implying the truth of others. Raymond Hettinger ################################################################# ################################################################# ################################################################# ##### ##### ##### ################################################################# ################################################################# #################################################################
From: "Guido van Rossum" <guido@python.org>
This seems an argument for keeping both __cmp__ and the six __lt__ etc. Yet TOOWTDI makes me want to get rid of __cmp__.
one minor problem with the six __lt__ etc is that they should be all defined. For quick things (although I know better) I still define just __cmp__ out of laziness. regards
one minor problem with the six __lt__ etc is that they should be all defined. For quick things (although I know better) I still define just __cmp__ out of laziness.
Sometime back, I proposed a mixin for this. class C(CompareMixin): def __eq__(self, other): ... def __lt__(self, other): ... The __eq__ by itself is enough to get __ne__ defined for you. Defining both __eq__ and __lt__ gets you all the rest. Raymond Hettinger ################################################################# ################################################################# ################################################################# ##### ##### ##### ################################################################# ################################################################# #################################################################
[Samuele Pedroni]
one minor problem with the six __lt__ etc is that they should be all defined. For quick things (although I know better) I still define just __cmp__ out of laziness.
Or out of sanity <wink>. 2.3's datetime type is interesting that way. The Python implementation of that (which lives in Zope3, and in a Python sandbox, and which you may want to use for Jython) now has lots of trivial variations of def __le__(self, other): if isinstance(other, date): return self.__cmp(other) <= 0 elif hasattr(other, "timetuple"): return NotImplemented else: _cmperror(self, other) Before 2.3a2, it just defined __cmp__ and so avoided this code near-duplication, but then we decided it would be better to let == and != return False and True (respectively, and instead of raising TypeError) when mixing a date or time or datetime with some other type.
[Guido]
... I wonder, what's the need for cmp()? My hunch is that the main reason for cmp() is that it's specified in various APIs -- e.g. list.sort(), or FP hardware. But don't those APIs usually specify cmp() because their designers (mistakenly) believed the three different outcomes were easy to compute together and it would simplify the API?
The three possible outcomes from lexicographic comparison are natural to compute together, though (compare elements left to right until hitting the first non-equal element compare). I expect C's designers had string comparison mostly in mind, and it's natural for lots of search structures to need know which of the three outcomes obtains. For example, probing a vanilla binary search tree needs to stop when it hits a node with key equal to the thing searched for, or move left or right when != obtains. Long int comparison is a variant of lexicographic comparison, and this problem shows up repeatedly in a multitude of guises: you have postive long ints x and y, and want to find the long int q closest to x/y. q, r = divmod(x, y) # round nearest/even if 2*r > q or (q & 1 and 2*r == q): q += 1 is more expensive than necessary when the "q & 1 and 2*r == q" part holds: the "2*r > q" part had to compare 2*r to q all the way to the end to deduce that > wasn't the answer, and then you do it all over again to deduce that equality is the right answer. q, r = divmod(x, y) c = cmp(2*r, q) if c > 0 or (q & 1 and c == 0): q += 1 is faster, and Python's long_compare() doesn't do any more work than is really needed by this algorithm. So sometimes cmp() is exactly what you want. OTOH, if Python never had it, the efficiency gains in such cases probably aren't common enough that a compelling case for adding it could have been made.
Tim> For example, probing a vanilla binary search tree needs to stop Tim> when it hits a node with key equal to the thing searched for, or Tim> move left or right when != obtains. The binary-search routines in the C++ standard library mostly avoid having to do != comparisons by defining their interfaces in the following clever way: binary_search returns a boolean that indicates whether the value sought is in the sequence. It does not say where that value is. lower_bound returns the first position ahead of which the given value could be inserted without disrupting the ordering of the sequence. upper_bound returns the last position ahead of which the given value could be inserted without disrupting the ordering of the sequence. equal_range returns (lower_bound, upper_bound) as a pair. In Python terms: binary_search([3, 5, 7], 6) would yield False binary_search([3, 5, 7], 7) would yield True lower_bound([1, 3, 5, 7, 9, 11], 9) would yield 4 lower_bound([1, 3, 5, 7, 9, 11], 8) would also yield 4 upper_bound([1, 3, 5, 7, 9, 11], 9) would yield 5 equal_range([1, 1, 3, 3, 3, 5, 5, 5, 7], 3) would yield (2, 5). If you like, equal_range(seq, x) returns (l, h) such that all the elements of seq[l:h] are equal to x. If l == h, the subsequence is the empty sequence between the two adjacent elements with values that bracket x. These definitions turn out to be useful in practice, and are also easy to implement efficiently using only < comparisons. -- Andrew Koenig, ark@research.att.com, http://www.research.att.com/info/ark
[ark@research.att.com]
The binary-search routines in the C++ standard library mostly avoid having to do != comparisons by defining their interfaces in the following clever way:
binary_search returns a boolean that indicates whether the value sought is in the sequence. It does not say where that value is.
lower_bound returns the first position ahead of which the given value could be inserted without disrupting the ordering of the sequence.
upper_bound returns the last position ahead of which the given value could be inserted without disrupting the ordering of the sequence.
These last two are quite like Python's bisect.bisect_{left, right}, which are implemented using only __lt__ element comparison.
equal_range returns (lower_bound, upper_bound) as a pair.
In Python terms:
binary_search([3, 5, 7], 6) would yield False binary_search([3, 5, 7], 7) would yield True lower_bound([1, 3, 5, 7, 9, 11], 9) would yield 4 lower_bound([1, 3, 5, 7, 9, 11], 8) would also yield 4 upper_bound([1, 3, 5, 7, 9, 11], 9) would yield 5
import bisect x = [1, 3, 5, 7, 9, 11] bisect.bisect_left(x, 9) 4 bisect.bisect_left(x, 8) 4 bisect.bisect_right(x, 9) 5
We conclude that C++ did something right <wink>.
equal_range([1, 1, 3, 3, 3, 5, 5, 5, 7], 3) would yield (2, 5).
If you like, equal_range(seq, x) returns (l, h) such that all the elements of seq[l:h] are equal to x. If l == h, the subsequence is the empty sequence between the two adjacent elements with values that bracket x.
These definitions turn out to be useful in practice, and are also easy to implement efficiently using only < comparisons.
I think Python got the most valuable of these, and they're useful in Python too. Nevertheless, if you're coding an explicit conventional binary search tree (nodes containing a value, a reference to "a left" node, and a reference to "a right" node), cmp() is more convenient; and even more so if you're coding a ternary search tree. Sometimes cmp allows for more compact code. Python's previous samplesort implementation endured a *little* clumsiness to infer equality (a == b) from not (a<b or a>b). The current adaptive mergesort feels the restriction to < more acutely and in more places. For example, when merging two runs A and B, part of the adaptive strategy is to precompute, via a form of binary search, where A[0] belongs in B, and where B[-1] belongs in A. This sounds like two instances of the same task, but they're maddeningly different because-- in order to preserve stability --the first search needs to be of the bisect_left flavor and the second of bisect_right. Combining both modes of operation in a single search routine with a flag argument, and sticking purely to __lt__, leads to horridly obscure code, so these searches are actually implemented by distinct functions. If it were able to use cmp() instead, folding them into one routine would have been unobjectionable (if < is needed, check for cmp < 0; if <= is needed, check for cmp <= 0 same-as cmp < 1; so 0 or 1 could be passed in to select between < and <= very efficiently and reasonably clearly).
Guido:
But don't those APIs usually specify cmp() because their designers (mistakenly) believed the three different outcomes were easy to compute together and it would simplify the API?
I reckon it all goes back to Fortran with its IF (X) 10,20,30 statement. Maybe the first Fortran machine had a 3-way jump instruction? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
Guido> This seems an argument for keeping both __cmp__ and the six __lt__ Guido> etc. Yet TOOWTDI makes me want to get rid of __cmp__. I'm beginning to wonder if part of what's going on is that there are really two different concepts that go under the general label of "comparison", namely the cases where trichotomy does and does not apply. In the first case, we have a total ordering; in the second, we have what C++ calls a "strict weak ordering", which is really an ordering of equivalence classes.
[Andrew Koenig]
I'm beginning to wonder if part of what's going on is that there are really two different concepts that go under the general label of "comparison", namely the cases where trichotomy does and does not apply.
In the first case, we have a total ordering; in the second, we have what C++ calls a "strict weak ordering", which is really an ordering of equivalence classes.
I'm afraid Python people really want a total ordering, because that's what Python gave them at the start (ya, I understand the long vs float business, but nobody in real life ever griped about that). It's a curious thing that the *specific* total ordering Python supplied changed across releases, and nobody complained about that(*). Also curious that, within a release, nobody complained that the specific total ordering can change across program runs (comparisons falling back to comparing object addresses are consistent within a run, but not necessarily across runs). That doesn't deny there are multiple comparison concepts people want, it just speaks against a strict weak ordering being one of them. For example, when using binary search on a sorted list to determine membership, people want total ordering. OTOH, when faced with 42 < "42" in isolation, sane Python people want an exception. When faced with "x in sequence_or_mapping", most people want __eq__ but some people want object identity (e.g., it's not always helpful that 3 == 3.0). One size doesn't fit anyone all the time. (*) I have to take that back: people *did* complain when the relative position of None changed. It's an undocumented fact that None compares "less than" non-None objects now (of types that don't force a different outcome), but that wasn't always so, and I clearly recall a few complaints after that changed, from people who apparently deliberately relied on its equally undocumented comparison behavior before.
On Sun, Mar 16, 2003 at 07:32:04AM -0500, Guido van Rossum wrote:
Guido> Yes. And I'm still hoping to remove __cmp__; there should be Guido> only one way to overload comparisons.
[ Andrew Koenig ]
Moreover, for some data structures, the __cmp__ approach can be expensive. For example, if you're comparing sequences of any kind, and you know that the comparison is for == or !=, you have your answer immediately if the sequences differ in length. If you don't know what's being tested, as you wouldn't inside __cmp__, you may spend a lot more time to obtain a result that will be thrown away.
Yes. OTOH, as long as cmp() is in the language, these same situations are more efficiently done by a __cmp__ implementation than by calling __lt__ and then __eq__ or similar (it's hard to decide which order is best). So cmp() should be removed at the same time as __cmp__.
I'm confused, as well as conflicted. I know I'm not well educated in language design or mathematics, and I realize that comparisons between types don't always make *mathematical* sense, but to go from there to removing type-agnostic (not the right term, but bear with me) list-sorting and three-way comparison altogether is really a big jump, and one I really don't agree with. I find being able to sort (true) heterogeneous lists in a consistent if not 'purely' sensible manner to be quit useful at times, and all other times I already know I have a homogeneous list and don't care about it. It's a practical approach because I don't have to think about how it's going to be sorted, I don't have to take every edgecase into account, and I don't have to know in advance what my list is going to contain (or update all calls to 'sort' when I find I have to find a conflicting type to the list.) I do not see how this is harmful; the cases I've seen where people bump into this the hard way (e.g. doing "0" < 1) were fundamental misunderstandings that would be corrected in a dozen other ways. Allowing 'senseless' comparisons does not strike me as a major source of confusion or bad code. I was uneasy about the change in complex number comparison, but I didn't mind that, then, because it is a very rarely used feature to start with and when looking at it from a 'unified number' point of view, it makes perfect sense. But the latter does not apply to most other types, and I don't believe it should. My defensive programming nature (I write Perl for a living, if I wasn't defensive by nature I'd have committed suicide by now) would probably make me always use a 'useful sorter' function, possibly by using subclasses of list (so I could guard against other subtle changes, too, by changing one utility library, tw.Tools. Yuck.) I really don't like how that affects the readability of the code. I'd feel better about disallowing '==' for floating point numbers, as I can see why that is a problem. But I don't feel good about that either ;) I really like how most Python objects are convenient. Lists grow when you need them to, slices do what you think they do, dicts can take any (hashable) object as a key (boy, do I miss that in Perl), mathematical operations work with simple operators even between types, objects, instances, classes and modules all behave consistently and have consistent syntax. Yes, Python has some odd quirks, some of which require a comment or two when writing code that will be read by people with little or no Python knowledge (e.g. my colleagues.) But I believe adding a small comment like "'global' is necessary to *assign* to variables in the module namespace" or "'%(var)s' is how we say '$var'" or "'x and y or s' is like 'x ? y : s' if y is true, which it always is here" or any of the half-dozen other things I can imagine, not counting oddities in standard modules, is preferable over forcing the syntax or restricting the usage to try and 'solve' the quirks.
And then we should also change list.sort(), as Tim points out. Maybe we can start introducing this earlier by using keyword arguments:
list.sort(lt=function) sorts using a < implementation list.sort(cmp=function) sorts using a __cmp__ implementation
Perhaps we need stricter prototypes, that define the returnvalue. Or properties on (or classes of) functions, so we can tell whether a function implements the lessthan interface, or the threeway one. It would definately *look* better than the above ;) Practically-beats-this-idea-in-my-eyes'ly y'rs ;) -- Thomas Wouters <thomas@xs4all.net> Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
Moreover, for some data structures, the __cmp__ approach can be expensive. For example, if you're comparing sequences of any kind, and you know that the comparison is for == or !=, you have your answer immediately if the sequences differ in length. If you don't know what's being tested, as you wouldn't inside __cmp__, you may spend a lot more time to obtain a result that will be thrown away.
Guido> Yes. OTOH, as long as cmp() is in the language, these same situations Guido> are more efficiently done by a __cmp__ implementation than by calling Guido> __lt__ and then __eq__ or similar (it's hard to decide which order is Guido> best). So cmp() should be removed at the same time as __cmp__. Yes. Guido> And then we should also change list.sort(), as Tim points out. Maybe Guido> we can start introducing this earlier by using keyword arguments: Guido> list.sort(lt=function) sorts using a < implementation Guido> list.sort(cmp=function) sorts using a __cmp__ implementation The keyword argument might not be necessary: It is always possible for a function such as sort to figure out whether a comparison function is 2-way or 3-way (assuming it matters) by doing only one extra comparison.
Guido> And then we should also change list.sort(), as Tim points Guido> out. Maybe we can start introducing this earlier by using Guido> keyword arguments:
Guido> list.sort(lt=function) sorts using a < implementation Guido> list.sort(cmp=function) sorts using a __cmp__ implementation
[Andrew Koenig]
The keyword argument might not be necessary: It is always possible for a function such as sort to figure out whether a comparison function is 2-way or 3-way (assuming it matters) by doing only one extra comparison.
That's cute, but a bit too magical for my taste... It's not immediately obvious how this would be done (I know how, but it would require a lot of explaining). Plus, -1 is a perfectly valid truth value. --Guido van Rossum (home page: http://www.python.org/~guido/)
Guido> That's cute, but a bit too magical for my taste... It's not Guido> immediately obvious how this would be done (I know how, but it Guido> would require a lot of explaining). Plus, -1 is a perfectly Guido> valid truth value. Yes, I know that -1 is a valid truth value. Here's the trick. The object of the game is to figure out whether f is < or __cmp__. Suppose you call f(x, y) and it returns 0. Then you don't care which one f is, because x<y is false either way. So the first time you care is the first time f(x, y) returns nonzero. Now you can find out what kind of function f is by calling f(y, x). If f(y, x) returns zero, f is <. Otherwise, it's a 3-way comparison.
Yes, I know that -1 is a valid truth value.
So the first time you care is the first time f(x, y) returns nonzero. Now you can find out what kind of function f is by calling f(y, x). If f(y, x) returns zero, f is <. Otherwise, it's a 3-way comparison.
I think the worry is that the function might be saying "true" to both of these, but just happen to spell it 1 the first time and -1 the second. Probably fairly unlikely, though... Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
[Andrew Koenig]
Yes, I know that -1 is a valid truth value.
So the first time you care is the first time f(x, y) returns nonzero. Now you can find out what kind of function f is by calling f(y, x). If f(y, x) returns zero, f is <. Otherwise, it's a 3-way comparison.
[Greg Ewing]
I think the worry is that the function might be saying "true" to both of these, but just happen to spell it 1 the first time and -1 the second.
Then it's answering true to both x < y ? and y < x ? The comparison function is insane, then, so it doesn't matter what list.sort() does in that case (the algorithm is robust against insane comparison functions now, but doesn't define what will happen then beyond that the output list will contain a permutation of its input state). I've ignored this scheme for two reasons: anti-Pythonicity (having Python guess which kind of comparison function you wrote is anti-Pythonic on the face of it), and inefficiency. list.sort() is so bloody highly tuned now that adding even one test-&-branch per comparison, in C, on native C ints, gives a measurable slowdown, even when the user passes an expensive comparison function. In the case that no comparison function is passed, we're able to skip a layer of function call now by calling PyObject_RichCompareBool(X, Y, Py_LT) directly (no cmp-to-LT conversion is needed then). Against that, it could be natural to play Andrew's trick only in count_run() (the part of the code that identifies natural runs). That would be confined to 2 textual comparison sites, and does no more than len(list)-1 comparisons total now.
Then it's answering true to both
x < y ? and y < x ?
The comparison function is insane, then
No, I'm the one that's insane, I think. You're right, this is impossible. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
Then it's answering true to both
x < y ? and y < x ?
The comparison function is insane, then
[Greg Ewing]
No, I'm the one that's insane, I think. You're right, this is impossible.
For a sane comparison function, yes. Python can't enforce that user-supplied functions are sane, though, and-- as always --it's Python's job to ensure that nothing catastrophic happens when users go bad. One of the reasons Python had to grow its own sort implementation is that various platform qsort() implementations weren't robust against ill-behaved cmp functions. For example, a typical quicksort partitioning phase searches right for the next element >= key, and left for the next <= key. Some are tempted to save inner-loop index comparisons by ensuring that the leftmost slice element is <= key, and the rightmost >= key, before partitioning begins. Then the left and right inner searches are "guaranteed" not to go too far, and by element comparisons alone. But if the comparison function is inconsistent, that can lead to the inner loops reading outside the slice bounds, and so cause segfaults. Python's post-platform-qsort sorts all protect against that kind of crud, but can't give a useful specification of the result in such cases (beyond that the list is *some* permutation of its input state -- no element is lost or duplicated -- and guaranteeing just that much in the worst cases causes some pain in the implementation).
Guido> That's cute, but a bit too magical for my taste... It's not Guido> immediately obvious how this would be done (I know how, but it Guido> would require a lot of explaining). Plus, -1 is a perfectly Guido> valid truth value.
Yes, I know that -1 is a valid truth value.
Here's the trick. The object of the game is to figure out whether f is < or __cmp__.
Suppose you call f(x, y) and it returns 0. Then you don't care which one f is, because x<y is false either way.
So the first time you care is the first time f(x, y) returns nonzero. Now you can find out what kind of function f is by calling f(y, x). If f(y, x) returns zero, f is <. Otherwise, it's a 3-way comparison.
Right. There's no flaw in this logic, but I'd hate to have to explain it over and over... I don't want people to believe that Python can somehow magically sniff the difference between two functions; they might expect it in other contexts. --Guido van Rossum (home page: http://www.python.org/~guido/)
So the first time you care is the first time f(x, y) returns nonzero. Now you can find out what kind of function f is by calling f(y, x). If f(y, x) returns zero, f is <. Otherwise, it's a 3-way comparison.
Guido> Right. There's no flaw in this logic, but I'd hate to have to Guido> explain it over and over... I don't want people to believe Guido> that Python can somehow magically sniff the difference between Guido> two functions; they might expect it in other contexts. I can understand your reluctance -- I was just pointing out that it's possible. However, I'm slightly dubious about the x.sort(lt=f) vs x.sort(cmp=f) technique because it doesn't generalize terribly well. If I want to write a function that takes a comparison function as an argument, and eventualy passes that function to sort, what do I do? Something like this? def myfun(foo, bar, lt=None, cmp=None): # ... x.sort(lt=lt, cmp=cmp) # ... and assume that sort will use None as its defaults also? Or must I write if lt==None: x.sort(cmp=cmp) else: x.sort(lt=lt) Either way it's inconvenient. So I wonder if it might be better, as a way of allowing sort to take two different types of comparison functions, to distinguish between them by making them different types. -- Andrew Koenig, ark@research.att.com, http://www.research.att.com/info/ark
However, I'm slightly dubious about the x.sort(lt=f) vs x.sort(cmp=f) technique because it doesn't generalize terribly well.
If I want to write a function that takes a comparison function as an argument, and eventualy passes that function to sort, what do I do? Something like this?
def myfun(foo, bar, lt=None, cmp=None): # ... x.sort(lt=lt, cmp=cmp) # ...
and assume that sort will use None as its defaults also? Or must I write
if lt==None: x.sort(cmp=cmp) else: x.sort(lt=lt)
Either way it's inconvenient.
Given that (if we add this) the cmp argument will be deprecated, myfun() should take a 'lt' comparison only.
So I wonder if it might be better, as a way of allowing sort to take two different types of comparison functions, to distinguish between them by making them different types.
But Python doesn't do types that way. --Guido van Rossum (home page: http://www.python.org/~guido/)
[Guido]
Yes. And I'm still hoping to remove __cmp__; there should be only one way to overload comparisons.
As long as we're going to break everyone's code, list.sort(f) should also be redefined then so that f returns a Boolean, f(a, b) meaning a is less than b. list.sort()'s implementation uses only less-than already, and it seems that all newcomers to Python who didn't come by way of C or Perl (same thing in this respect <wink>) expect sort's comparison function to work that way.
Tim> [Guido]
Yes. And I'm still hoping to remove __cmp__; there should be only one way to overload comparisons.
Tim> As long as we're going to break everyone's code, list.sort(f) Tim> should also be redefined then so that f returns a Boolean, f(a, Tim> b) meaning a is less than b. I don't think it's necessary to break code in order to accommodate that change, as long as you're willing to tolerate one extra comparison per call to sort, plus a small amount of additional overhead. As I understand it, the problem is to distinguish between a function that returns negative, zero, or positive, depending on the result of the comparison, and a function that returns true or false. So if we had a way to determine efficiently which kind of function the user supplied, we could maintain compatibility. Imagine, then, that we have a function f, and we want to figure out which kind of function it is. Assume, furthermore, that the only kind of commparisons we want to perform is to determine whether a < b for various values of a and b. Note first that whenever f(a, b) returns 0, we don't care which kind of function f is, because a < b will be false in either case. So we allow our sort algorithm to run until the first time a call to f(a, b) returns a nonzero value. Now we can determine what kind of function f is by calling f(b, a). If f(b, a) is zero, then f is a boolean predicate. If f(b, a) is nonzero, then f returns negative/zero/positive -- and, incidentally, f(b, a) had better have the opposite sign from f(a, b). I understand that there is some overhead involved in storing the information about which kind of comparison it is, and testing it on each comparison. I suspect, however, that that overhead can be made small compared to the overhead involved in calling the comparison function itself. -- Andrew Koenig, ark@research.att.com, http://www.research.att.com/info/ark
Guido:
And I'm still hoping to remove __cmp__; there should be only one way to overload comparisons.
I'd rather you kept it and re-defined it to mean "compare for arbitrary ordering". (Maybe change its name if there are backwards-compatibility issues.) Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
Guido:
And I'm still hoping to remove __cmp__; there should be only one way to overload comparisons.
[Greg]
I'd rather you kept it and re-defined it to mean "compare for arbitrary ordering". (Maybe change its name if there are backwards-compatibility issues.)
Hm, that's not what it does now, and an arbitrary ordering is better defined by a "less" style operator. I've been thinking of __before__ and a built-in before(x, y) -> bool. (Not __less__ / less, because IMO that's to close to __lt__ / <.) BTW, there are two possible uses for before(): it could be used to impose an arbitrary ordering for types that don't have one now (like complex); and it could be used to impose an ordering between different types (like numbers and strings). I've got a gut feeling that the requirements for these are somewhat different, but can't quite pinpoint it. --Guido van Rossum (home page: http://www.python.org/~guido/)
On Monday 17 March 2003 02:50 am, Guido van Rossum wrote:
Guido:
And I'm still hoping to remove __cmp__; there should be only one way to overload comparisons.
[Greg]
I'd rather you kept it and re-defined it to mean "compare for arbitrary ordering". (Maybe change its name if there are backwards-compatibility issues.)
Hm, that's not what it does now, and an arbitrary ordering is better defined by a "less" style operator.
+1. I entirely agree that any ordering is easier to define with a 2-way comparison with the usual constraints of ordering, i.e., for any x, y, before(x,x) is false, before(x,y) implies not before(y,x), before(x,y) and before(y,z) implies before(x,z) and for this specific purpose of arbitrary ordering, it would, I think, be necessary for 'before' to define a total ordering, i.e. the implied equivalence being equality, i.e. not before(x,y) and not before(y,x) imply x==y (This latter desideratum may be a source of problems, see below). It would also be very nice if before(x,y) were the same as x<y whenever the latter doesn't raise an exception, if feasible.
I've been thinking of __before__ and a built-in before(x, y) -> bool. (Not __less__ / less, because IMO that's to close to __lt__ / <.)
I love the name 'before' and I entirely concur with your decision to avoid the name 'less'.
BTW, there are two possible uses for before(): it could be used to impose an arbitrary ordering for types that don't have one now (like complex); and it could be used to impose an ordering between different types (like numbers and strings). I've got a gut feeling that the requirements for these are somewhat different, but can't quite pinpoint it.
Perhaps subclassing/subtyping [and other possible cases where x==y may be true yet type(x) is not type(y)] may be the sticky issues, when all desirable constraints are considered together. The simplest problematic case I can think of is before(1,1+0j) -- by the "respect ==" criterion I would want both this comparison, and the same one with operands swapped, to be false; but by the criterion of imposing ordering between different incomparable types, I would like 'before' to range all instances of 'complex' "together", either all before, or all after, "normal" (comparable) numbers (ideally in a way that while arbitrary is repeatable over different runs/platforms/Python releases -- mostly for ease of teaching and explaining, rather than as a specific application need). Alex
Alex Martelli wrote:
On Friday 14 March 2003 04:42 pm, Christian Tismer wrote: ...
And the key assumption for sorting things is that the things are sortable, which means there exists and order on the basic set. Which again suggests that list elements usually have something in common.
If a list contains ONE complex number and no other number, then the list can be sorted.
By a similar argument, tuples of one element can be sorted and reversed, just by doing nothing :-)
If the list contains elements that having something in common, by both being complex numbers, then it cannot be sorted.
Sure it can, by supplying a compare function, which implements the particular sorting operation that you want. Perhaps you want to sort them by their abs value or something. (And then you probably will want a stable sort, which is meanwhile a nice fact thanks to Tim:
a=[1, 2, 2+2j, 3+1j, 1+3j, 3-3j, 3+1j, 1+3j] a.sort(lambda x, y:cmp(abs(x), abs(y))) a [1, 2, (2+2j), (3+1j), (1+3j), (3+1j), (1+3j), (3-3j)]
) Complex just has no total order, which makes it impossible to provide a meaningful default ordering.
So, lists whose elements have LESS in common (by being of widely different types) are more likely to be sortable than lists some of whose elements have in common the fact of being numbers (if one or more of those numbers are complex).
I agree that my statement does not apply when putting non-sortable things into a list. But I don't believe that people are putting widely different types into a list in order to sort them. (Although there is an arbitrary order between strings and numbers, which I would drop in Python 2.4, too). chris -- Christian Tismer :^) <mailto:tismer@tismer.com> Mission Impossible 5oftware : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9a : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 89 09 53 34 home +49 30 802 86 56 pager +49 173 24 18 776 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/
On Saturday 15 March 2003 04:59 pm, Christian Tismer wrote: ...
Complex just has no total order, which makes it impossible to provide a meaningful default ordering.
Back in Python 1.5.2 times, the "impossible to provide" ordering was there. No more (and no less!) "meaningful" than, say, comparisons between (e.g.) lists, numbers, strings and dicts, which _are_ still provided as of Python 2.3.
I agree that my statement does not apply when putting non-sortable things into a list. But I don't believe
A list containing ONE complex number and (e.g.) three strings is sortable. So, there are NO "non-sortable things". A list is non-sortable (unless by providing a custom compare, as you pointed out) if it contains a complex number and any other number -- so, there _are_ "non-sortable LISTS" (unless suitable custom compares are used), but still no "non-sortable THINGS" in current Python.
that people are putting widely different types into a list in order to sort them. (Although there is an arbitrary order between strings and numbers, which I would drop in Python 2.4, too).
Such a change would indeed enormously increase the number of non-sortable (except by providing custom compares) lists. So, for example, all places which get and sort the list of keys in a dictionary in order to return or display the keys should presumably do the sorting within a try/except? Or do you think a dictionary should also be constrained to have keys that are all comparable with each other (i.e., presumably, never both string AND number keys) as well as hashable? I fail to see how forbidding me to sort the list of keys of any arbitrary dict will enhance my productivity in any way -- it's bad enough (in theory -- in practice it doesn't bite much as complex numbers are used so rarely) with the complex numbers thingy, why make it even worse by inventing a novel "strings vs numbers" split? Since when is Python about forbidding the user to do quite normal things such as sorting the list of keys of any arbitrary dictionary for more elegant display -- for no practically useful purpose that I've ever seen offered, in brisk violation of "practicality beats purity"? Alex
Alex Martelli wrote: ...
A list containing ONE complex number and (e.g.) three strings is sortable. So, there are NO "non-sortable things".
Quite an academical POV for a practical man like you.
A list is non-sortable (unless by providing a custom compare, as you pointed out) if it contains a complex number and any other number -- so, there _are_ "non-sortable LISTS" (unless suitable custom compares are used), but still no "non-sortable THINGS" in current Python.
Don't understand: How is a tuple not a non-sortable thing, unless I turn it into a list, which is not a tuple? Or do you mean the complex, which refuses to be sorted, unlike other obejcts, which don't provide any ordering, and are ordered by ID? [number/string comparison]
Such a change would indeed enormously increase the number of non-sortable (except by providing custom compares) lists.
Theoretical lists, or those existing in real applications? For the latter, most of the time, mixing ints and strings was most often a programming error in my past.
So, for example, all places which get and sort the list of keys in a dictionary in order to return or display the keys should presumably do the sorting within a try/except? Or do you think a dictionary should also be constrained to have keys that are all comparable with each other (i.e., presumably, never both string AND number keys) as well as hashable?
I would like to have sub-classes of dictionaries, which protect me from putting key into them which I didn't intend to. But that doesn't mean that I want to forbid it once and forever. Concerning general dicts, you are right, sorting the keys makes sense to get them into some arbitrary order.
I fail to see how forbidding me to sort the list of keys of any arbitrary dict will enhance my productivity in any way --
I thought it would catch the cases where you failed to build a key of the intended type. Maybe this is worse than what we have now, tho. I have to say that this wasn't the point of my message, so I don't care to discuss it. ...
Since when is Python about forbidding the user to do quite normal things such as sorting the list of keys of any arbitrary dictionary for more elegant display -- for no practically useful purpose that I've ever seen offered, in brisk violation of "practicality beats purity"?
Well, I just don't like such an arbitrary thing, that a string is always bigger than an int. Since we don't allow them to use as each other by coercion, we also should not compare them. Bean counts are bean counts, and names are names. One could go the AWK way, where ints and strings were concerted whenever necessaray, but that would be even worse. Maybe the way Python handles it is not so bad. But then it sould be consequent and at least move complex objects into their own group in the sorted array, maybe just not sorting themselves. Anyway, this would also not increase your/my productivity in any way, so let's get back to real problems. ciao - chris -- Christian Tismer :^) <mailto:tismer@tismer.com> Mission Impossible 5oftware : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9a : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 89 09 53 34 home +49 30 802 86 56 pager +49 173 24 18 776 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/
[Christian Tismer]
that people are putting widely different types into a list in order to sort them. (Although there is an arbitrary order between strings and numbers, which I would drop in Python 2.4, too).
[Alex Martelli]
Such a change would indeed enormously increase the number of non-sortable (except by providing custom compares) lists. So, for example, all places which get and sort the list of keys in a dictionary in order to return or display the keys should presumably do the sorting within a try/except?
I don't believe this argument. I've indeed often sorted a dict's keys (or values), but always in situations where the sorted values were homogeneous as far meaningful comparison goes, e.g. all numbers, or all strings, or all "compatible" tuples.
Or do you think a dictionary should also be constrained to have keys that are all comparable with each other (i.e., presumably, never both string AND number keys) as well as hashable?
If you know *nothing* about the keys of a dict, you already have to do that if you want to sort the keys. There are lots of apps that have no need to ever sort the keys: if there weren't, it would have been wiser to keep the keys in sorted order in the first place, like ABC did.
I fail to see how forbidding me to sort the list of keys of any arbitrary dict will enhance my productivity in any way -- it's bad enough (in theory -- in practice it doesn't bite much as complex numbers are used so rarely) with the complex numbers thingy, why make it even worse by inventing a novel "strings vs numbers" split?
To the contrary, I don't see how it will reduce your productivity. You seem to be focusing on the wrong thing (sorting dict keys). The right thing to consider here is that "a < b" should only work if a and b can be meaningfully ordered, just like "a + b" only works if a and b can be meaningfully added.
Since when is Python about forbidding the user to do quite normal things such as sorting the list of keys of any arbitrary dictionary for more elegant display -- for no practically useful purpose that I've ever seen offered, in brisk violation of "practicality beats purity"?
I doubt that elegant display of a dictionary with wildly incompatible keys is high on anybody's list of use cases. On the other hand, I'm sure that raising an exception on abominations like 2 < "1" or (1, 2) < 0 is a good thing, just like we all agree that forbidding 1 + "2" is a good thing. Of course, == and != will continue to accept objects of incongruous types -- these will simply be considered inequal. That's the cornerstone of dictionaries, and I see no reason to change this -- while I don't know whether 1 ought to be considered less than or greater than 1j, I damn well know they aren't equal! (And I'm specifically excluding gray areas like comparing tuples and lists. Given that (a, b) = [1, 2] now works, as does [].extend(()), it might be better to allow comparing tuples to lists, and even consider them equal if they have the same length and their items compare equal pairwise. This despite my position about the different idiomatic uses of the two types. And so the circle closes [see Subject]. :-) --Guido van Rossum (home page: http://www.python.org/~guido/)
Christian Tismer <tismer@tismer.com> writes:
a=[1, 2, 2+2j, 3+1j, 1+3j, 3-3j, 3+1j, 1+3j] a.sort(lambda x, y:cmp(abs(x), abs(y))) a [1, 2, (2+2j), (3+1j), (1+3j), (3+1j), (1+3j), (3-3j)]
Ooh, now I get to mention the list.sort feature request I came up with this weekend <wink>: I'd like to be able to write the above call as:
a.sort(key=abs)
Cheers, M. -- 112. Computer Science is embarrassed by the computer. -- Alan Perlis, http://www.cs.yale.edu/homes/perlis-alan/quotes.html
Alex Martelli <aleax@aleax.it>:
So, lists whose elements have LESS in common (by being of widely different types) are more likely to be sortable than lists some of whose elements have in common the fact of being numbers (if one or more of those numbers are complex).
As I think I've mentioned before, Python really needs two different kinds of comparison: one which does whatever makes sense for objects of compatible types (and which need not be supported by all types), and another which imposes an arbitrary order on all objects. When sorting a list, you would have to specify which kind of ordering you wanted. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
participants (15)
-
Aahz
-
Alex Martelli
-
Andrew Koenig
-
Christian Tismer
-
Greg Ewing
-
Guido van Rossum
-
Just van Rossum
-
Kevin J. Butler
-
Michael Hudson
-
Raymond Hettinger
-
Samuele Pedroni
-
Terry Reedy
-
Thomas Wouters
-
Tim Peters
-
Tim Peters