[Tutor] string rules for 'number'

Oscar Benjamin oscar.j.benjamin at gmail.com
Mon Oct 8 21:48:30 CEST 2012


On 8 October 2012 03:19, eryksun <eryksun at gmail.com> wrote:
> As a supplement to what's already been stated about string
> comparisons, here's a possible solution if you need a more 'natural'
> sort order such as '1', '5', '10', '50', '100'.
>
> You can use a regular expression to split the string into a list of
> (digits, nondigits) tuples (mutually exclusive) using re.findall. For
> example:
>
>     >>> import re
>     >>> dndre = re.compile('([0-9]+)|([^0-9]+)')
>
>     >>> re.findall(dndre, 'a1b10')
>     [('', 'a'), ('1', ''), ('', 'b'), ('10', '')]
>
> Use a list comprehension to choose either int(digits) if digits is
> non-empty else nondigits for each item. For example:
>
>     >>> [int(d) if d else nd for d, nd in re.findall(dndre, 'a1b10')]
>     ['a', 1, 'b', 10]
>
> Now you have a list of strings and integers that will sort 'naturally'
> compared to other such lists, since they compare corresponding items
> starting at index 0. All that's left to do is to define this operation
> as a key function for use as the "key" argument of sort/sorted. For
> example:
>
>     import re
>
>     def natural(item, dndre=re.compile('([0-9]+)|([^0-9]+)')):
>         if isinstance(item, str):
>             item = [int(d) if d else nd for d, nd in
>                     re.findall(dndre, item.lower())]
>         return item
>
> The above transforms all strings into a list of integers and lowercase
> strings (if you don't want letter case to affect sort order). In
> Python 2.x, use "basestring" instead of "str". If you're working with
> bytes in Python 3.x, make sure to first decode() the items before
> sorting since the regular expression is only defined for strings.
>
> Regular sort:
>
>     >>> sorted(['s1x', 's10x', 's5x', 's50x', 's100x'])
>     ['s100x', 's10x', 's1x', 's50x', 's5x']
>
> Natural sort:
>
>     >>> sorted(['s1x', 's10x', 's5x', 's50x', 's100x'], key=natural)
>     ['s1x', 's5x', 's10x', 's50x', 's100x']

For simple cases like this example I tend to use:

>>> natural = lambda x: (len(x), x)
>>> sorted(['s1x', 's10x', 's5x', 's50x', 's100x'], key=natural)
['s1x', 's5x', 's10x', 's50x', 's100x']


Oscar


More information about the Tutor mailing list