sort order for strings of digits
Steven D'Aprano
steve+comp.lang.python at pearwood.info
Wed Oct 31 19:09:52 EDT 2012
On Wed, 31 Oct 2012 15:17:14 +0000, djc wrote:
> The best I can think of is to split the input sequence into two lists,
> sort each and then join them.
According to your example code, you don't have to split the input because
you already have two lists, one filled with numbers and one filled with
strings.
But I think that what you actually have is a single list of strings, and
you are supposed to sort the strings such that they come in numeric order
first, then alphanumerical. E.g.:
['9', '1000', 'abc2', '55', '1', 'abc', '55a', '1a']
=> ['1', '1a', '9', '55', '55a', '1000', 'abc', 'abc2']
At least that is what I would expect as the useful thing to do when
sorting.
The trick is to take each string and split it into a leading number and a
trailing alphanumeric string. Either part may be "empty". Here's a pure
Python solution:
from sys import maxsize # use maxint in Python 2
def split(s):
for i, c in enumerate(s):
if not c.isdigit():
break
else: # aligned with the FOR, not the IF
return (int(s), '')
return (int(s[:i] or maxsize), s[i:])
Now sort using this as a key function:
py> L = ['9', '1000', 'abc2', '55', '1', 'abc', '55a', '1a']
py> sorted(L, key=split)
['1', '1a', '9', '55', '55a', '1000', 'abc', 'abc2']
The above solution is not quite general:
* it doesn't handle negative numbers or numbers with a decimal point;
* it doesn't handle the empty string in any meaningful way;
* in practice, you may or may not want to ignore leading whitespace,
or trailing whitespace after the number part;
* there's a subtle bug if a string contains a very large numeric prefix,
finding and fixing that is left as an exercise.
--
Steven
More information about the Python-list
mailing list