[Tutor] three numbers for one

eryksun eryksun at gmail.com
Sat Jun 8 09:40:41 CEST 2013


On Sat, Jun 8, 2013 at 1:25 AM, Steven D'Aprano <steve at pearwood.info> wrote:
>
> The str.isnumeric() method is the least specific of the three methods. It
> returns True if the string contains only characters in any of the three
> numeric categories, 'Nd', 'No' and 'Nl'.

It's strictly more accurate to say isnumeric is based on the numeric
type instead of the general category. This agrees with the result in
Python even if there are quirks in the database. For example, Python
3.3 uses Unicode 6.1, which has 4 'Nl' characters (out of 224) that
have no defined numeric value (i.e. field 8 is null):

    12432;CUNEIFORM NUMERIC SIGN SHAR2 TIMES GAL PLUS DISH;
        Nl;0;L;;;;;N;;;;;
    12433;CUNEIFORM NUMERIC SIGN SHAR2 TIMES GAL PLUS MIN;
        Nl;0;L;;;;;N;;;;;
    12456;CUNEIFORM NUMERIC SIGN NIGIDAMIN;
        Nl;0;L;;;;;N;;;;;
    12457;CUNEIFORM NUMERIC SIGN NIGIDAESH;
        Nl;0;L;;;;;N;;;;;

So Python 3.3 happily declares these characters to be 'Nl' and yet non-numeric:

    >>> unicodedata.category('\U00012432')
    'Nl'
    >>> '\U00012432'.isnumeric()
    False
    >>> unicodedata.numeric('\U00012432')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ValueError: not a numeric character


Unicode 6.2 fixes this:

    12432;CUNEIFORM NUMERIC SIGN SHAR2 TIMES GAL PLUS DISH;
        Nl;0;L;;;;216000;N;;;;;
    12433;CUNEIFORM NUMERIC SIGN SHAR2 TIMES GAL PLUS MIN;
        Nl;0;L;;;;432000;N;;;;;
    12456;CUNEIFORM NUMERIC SIGN NIGIDAMIN;
        Nl;0;L;;;;-1;N;;;;;
    12457;CUNEIFORM NUMERIC SIGN NIGIDAESH;
        Nl;0;L;;;;-1;N;;;;;

Unicode 6.3 fixes it again:

    12456;CUNEIFORM NUMERIC SIGN NIGIDAMIN;
        Nl;0;L;;;;2;N;;;;;
    12457;CUNEIFORM NUMERIC SIGN NIGIDAESH;
        Nl;0;L;;;;3;N;;;;;

3rd time's a charm, right?


More information about the Tutor mailing list