[Tutor] string rules for 'number'
Oscar Benjamin
oscar.j.benjamin at gmail.com
Mon Oct 8 21:48:30 CEST 2012
On 8 October 2012 03:19, eryksun <eryksun at gmail.com> wrote:
> As a supplement to what's already been stated about string
> comparisons, here's a possible solution if you need a more 'natural'
> sort order such as '1', '5', '10', '50', '100'.
>
> You can use a regular expression to split the string into a list of
> (digits, nondigits) tuples (mutually exclusive) using re.findall. For
> example:
>
> >>> import re
> >>> dndre = re.compile('([0-9]+)|([^0-9]+)')
>
> >>> re.findall(dndre, 'a1b10')
> [('', 'a'), ('1', ''), ('', 'b'), ('10', '')]
>
> Use a list comprehension to choose either int(digits) if digits is
> non-empty else nondigits for each item. For example:
>
> >>> [int(d) if d else nd for d, nd in re.findall(dndre, 'a1b10')]
> ['a', 1, 'b', 10]
>
> Now you have a list of strings and integers that will sort 'naturally'
> compared to other such lists, since they compare corresponding items
> starting at index 0. All that's left to do is to define this operation
> as a key function for use as the "key" argument of sort/sorted. For
> example:
>
> import re
>
> def natural(item, dndre=re.compile('([0-9]+)|([^0-9]+)')):
> if isinstance(item, str):
> item = [int(d) if d else nd for d, nd in
> re.findall(dndre, item.lower())]
> return item
>
> The above transforms all strings into a list of integers and lowercase
> strings (if you don't want letter case to affect sort order). In
> Python 2.x, use "basestring" instead of "str". If you're working with
> bytes in Python 3.x, make sure to first decode() the items before
> sorting since the regular expression is only defined for strings.
>
> Regular sort:
>
> >>> sorted(['s1x', 's10x', 's5x', 's50x', 's100x'])
> ['s100x', 's10x', 's1x', 's50x', 's5x']
>
> Natural sort:
>
> >>> sorted(['s1x', 's10x', 's5x', 's50x', 's100x'], key=natural)
> ['s1x', 's5x', 's10x', 's50x', 's100x']
For simple cases like this example I tend to use:
>>> natural = lambda x: (len(x), x)
>>> sorted(['s1x', 's10x', 's5x', 's50x', 's100x'], key=natural)
['s1x', 's5x', 's10x', 's50x', 's100x']
Oscar
More information about the Tutor
mailing list