on sorting things
Tony Flury
tony.flury at btinternet.com
Tue Jan 28 22:30:28 EST 2020
On 20/12/2019 18:59, Peter Otten wrote:
> Chris Angelico wrote:
>
>> On Sat, Dec 21, 2019 at 5:03 AM Peter Otten <__peter__ at web.de> wrote:
>>> PS: If you are sorting files by size and checksum as part of a
>>> deduplication effort consider using dict-s instead:
>> Yeah, I'd agree if that's the purpose. But let's say the point is to
>> have a guaranteed-stable ordering of files that are primarily to be
>> sorted by file size - in order to ensure that two files are in the
>> same order every time you refresh the view, they get sorted by their
>> checksums.
> One thing that struck me about Eli's example is that it features two key
> functions rather than a complex comparison.
>
> If sort() would accept a sequence of key functions each function could be
> used to sort slices that compare equal when using the previous key.
You don't need a sequence of key functions : the sort algorithm used in
Python (tim-sort) is stable - which means if two items (A &B) are in a
given order in the sequence before the sort starts, and A & B compare
equal during the sort, then after the sort A & B retain their ordering.
So if you want to sort by file size as the primary and then by checksum
if file sizes are equal - you sort by checksum first, and then by file
size: this guarantees that the items will always be in file size order -
and if file sizes are equal then they will be ordered by checksum.
The rule to remember - is sort in the reverse order of criteria.
>
>> There ARE good reasons to do weird things with sorting, and a custom
>> key object (either with cmp_to_key or directly implemented) can do
>> that.
> Indeed.
>
More information about the Python-list
mailing list