[Python-ideas] [Python-Dev] Unicode minus sign in numeric conversions

Andrew Barnert abarnert at yahoo.com
Mon Jun 10 00:28:25 CEST 2013


From: Alexander Belopolsky <alexander.belopolsky at gmail.com>
Sent: Sunday, June 9, 2013 3:12 PM


>On Sun, Jun 9, 2013 at 4:56 PM, Andrew Barnert <abarnert at yahoo.com> wrote:
>
>Also, consider how much more complicated parsing gets. Instead of just getting the digit value, you also have to get the group that the value comes from and make sure you haven't gotten any digits from an incompatible group yet. Why write, debug, and maintain that code?
>This is not that hard because the latest Unicode standard guarantees that decimal digits are "encoded in a contiguous range, with ascending order of Numeric_Value, and with the digit zero as the first code point in the range." (See 4.6 Numeric Value / Decimal Digits. <http://www.unicode.org/versions/Unicode6.2.0/ch04.pdf>.)   In other words, two digits belong to the same group if they have the same ord(x) - int(x) value.


It's not _hard_, but certainly it's a lot harder than not doing it:

parse_int_digit = int

def parse_int_1(s):
    i = 0
    for digit in map(parse_int_digit, s):
        i *= 10
        i += digit
    return i

def parse_int_2(s):
    i = 0
    first_digit_range = None
    for digit in map(parse_int_digit, s):
        i *= 10
        digit_range = ord(c) - digit

        if first_digit_range is None:
            first_digit_range = digit_range
        elif first_digit_range != digit_range:
            raise ValueError('Mixed digit ranges at {}: {} vs. {}'.format(digit, digit_range
                                                                          , first_digit_range))
        i += digit
    return i

I think I got them both right, but I'm less sure I about the second one. It's clearly harder to read and understand. It's also likely to take about twice as long to execute.


More information about the Python-ideas mailing list