Incomparable abominations (was: python-dev Summary)
tim_one at email.msn.com
Sun Mar 23 00:16:55 CET 2003
> I claim I should get an error when I compare incomparable types,
> because explicit beats implicit, and the current behavior has no
> practical value.
I think it's approximately impossible to explain why
1 + "2"
raises an exception now, but
1 < "2"
doesn't. In the earliest Pythons, for obscure technical reasons it wasn't
*possible* for a comparison operation to raise an exception, and I expect
the current inconsistency was driven at least as much by that technical
glitch as by deliberate choice.
> Sure it does, it has the value that (if L is a list) L.sort() always
> puts it in a canonical ordering (same list produces the same ordering,
> within runs from the same version of Python)
It's not easy to spell out what Python guarantees here today, in part
because phrases like "same list" could mean all sorts of things when
platforms, releases, or program runs separate the two lists claimed to be
Under any reasonable meaning I can dream up,
same list produces the same ordering, within runs from the same
version of Python
is too strong a claim, because Python's comparison sometimes resorts to
comparing objects' memory addresses. The most notorious case of that is
instances of a common class, where the class doesn't override the default
a = C()
b = C()
print cmp(a, b)
The result printed depends entirely on where the memory allocated for a
happens to land relative to the memory allocated for b. This can and does
+ Platforms, keeping the release of Python fixed.
+ Platform C library linked with, keeping the platform and Python
+ Program runs, keeping the platform and Python binary fixed.
The last point is platform- and build-dependent, for the code as given,
depending on how the platform C and OS interact to pass out address space.
On any platform, the result can differ across runs if the Python code is
more complicated (for example, if some other part of the program keys off
time of day, or a random.random() result, or thread timing -- anything that
can change the exact sequence of malloc() calls made).
This kind of thing is a common error when programming with persistent
databases. For example, ZODB uses persistent BTrees extensively, and BTrees
need a consistent total ordering of key values *across* program runs (and
across Python releases, for that matter). If key comparison ever falls back
to comparing object memory addresses, then a BTree created and used within a
single program run works fine, but when the next program run retrieves the
BTree from the database, the relative ordering of object addresses can be
any permutation of the original ordering. BTree invariants are violated
left and right then, but the database code has no clear and efficient way to
know that they are. The result is crazy-- and seemingly random --behavior,
instead of exceptions.
Applications get in real trouble from that, and also from letting 1 < "2"
> so e.g. you can compare sets of items for equality without worrying
> about them being in different orders.
For that purpose, it's usually more efficient to populate two dicts with the
list elements (mapping to counts if they're multisets), and then use dict1
== dict2. That's expected-case linear-time, in the number of elements.
Sorting has worse expected-case O() behavior.
I've used mixed_type_list.sort() on occasion just to bring objects of the
same type next to each other, but that could be done more efficiently using
a dict too (mapping type object to list of objects of that type).
I really don't know of a good use for mixed_type_list.sort(). If I had a
good use and Python griped, then DSU of the form
helper = [(type(obj), obj) for obj in mixed_type_list]
list = [obj for t, obj in helper]
would transform it easily enough into a sort that didn't compare objects of
different types. I have to admit I'm not even tempted to sort complex
looks-like-an-old-bug-to-me-ly y'rs - tim
More information about the Python-list