sort order for strings of digits

Steven D'Aprano steve+comp.lang.python at pearwood.info
Thu Nov 1 00:09:52 CET 2012


On Wed, 31 Oct 2012 15:17:14 +0000, djc wrote:

> The best I can think of is to split the input sequence into two lists,
> sort each and then join them.

According to your example code, you don't have to split the input because 
you already have two lists, one filled with numbers and one filled with 
strings.

But I think that what you actually have is a single list of strings, and 
you are supposed to sort the strings such that they come in numeric order 
first, then alphanumerical. E.g.:

['9', '1000', 'abc2', '55', '1', 'abc', '55a', '1a']
=> ['1', '1a', '9', '55', '55a', '1000', 'abc', 'abc2']

At least that is what I would expect as the useful thing to do when 
sorting.

The trick is to take each string and split it into a leading number and a 
trailing alphanumeric string. Either part may be "empty". Here's a pure 
Python solution:

from sys import maxsize  # use maxint in Python 2
def split(s):
    for i, c in enumerate(s):
        if not c.isdigit():
            break
    else:  # aligned with the FOR, not the IF
        return (int(s), '')
    return (int(s[:i] or maxsize), s[i:])

Now sort using this as a key function:

py> L = ['9', '1000', 'abc2', '55', '1', 'abc', '55a', '1a']
py> sorted(L, key=split)
['1', '1a', '9', '55', '55a', '1000', 'abc', 'abc2']


The above solution is not quite general:

* it doesn't handle negative numbers or numbers with a decimal point;

* it doesn't handle the empty string in any meaningful way;

* in practice, you may or may not want to ignore leading whitespace,
  or trailing whitespace after the number part;

* there's a subtle bug if a string contains a very large numeric prefix,
  finding and fixing that is left as an exercise.



-- 
Steven


More information about the Python-list mailing list