[Tutor] string rules for 'number'

Mon Oct 8 04:19:47 CEST 2012

On Sun, Oct 7, 2012 at 1:46 PM, Arnej Duranovic <arnejd5 at gmail.com> wrote:
>
> When I type this in the python idle shell ( version 3...) :
>             '0' <= '10' <= '9'
> The interpreter evaluates this as true, WHY? 10 is greater than 0 but not 9
> Notice I am not using the actual numbers, they are strings...I thought that
> numbers being string were ordered by their numerical value but obviously
> they are not?

As a supplement to what's already been stated about string
comparisons, here's a possible solution if you need a more 'natural'
sort order such as '1', '5', '10', '50', '100'.

You can use a regular expression to split the string into a list of
(digits, nondigits) tuples (mutually exclusive) using re.findall. For
example:

    >>> import re
    >>> dndre = re.compile('([0-9]+)|([^0-9]+)')

    >>> re.findall(dndre, 'a1b10')
    [('', 'a'), ('1', ''), ('', 'b'), ('10', '')]

Use a list comprehension to choose either int(digits) if digits is
non-empty else nondigits for each item. For example:

    >>> [int(d) if d else nd for d, nd in re.findall(dndre, 'a1b10')]
    ['a', 1, 'b', 10]

Now you have a list of strings and integers that will sort 'naturally'
compared to other such lists, since they compare corresponding items
starting at index 0. All that's left to do is to define this operation
as a key function for use as the "key" argument of sort/sorted. For
example:

    import re

    def natural(item, dndre=re.compile('([0-9]+)|([^0-9]+)')):
        if isinstance(item, str):
            item = [int(d) if d else nd for d, nd in
                    re.findall(dndre, item.lower())]
        return item

The above transforms all strings into a list of integers and lowercase
strings (if you don't want letter case to affect sort order). In
Python 2.x, use "basestring" instead of "str". If you're working with
bytes in Python 3.x, make sure to first decode() the items before
sorting since the regular expression is only defined for strings.

Regular sort:

    >>> sorted(['s1x', 's10x', 's5x', 's50x', 's100x'])
    ['s100x', 's10x', 's1x', 's50x', 's5x']

Natural sort:

    >>> sorted(['s1x', 's10x', 's5x', 's50x', 's100x'], key=natural)
    ['s1x', 's5x', 's10x', 's50x', 's100x']

Disclaimer: This is only meant to demonstrate the idea. You'll want to
search around for a 'natural' sort recipe or package that handles the
complexities of Unicode. It's probably not true that everything the
3.x re module considers to be a digit (the \d character class is
Unicode category [Nd]) will work with the int() constructor, so
instead I used [0-9] and [^0-9].