[Numpy-discussion] formatting issues, locale and co

Sun Dec 28 01:46:06 EST 2008

On Sun, Dec 28, 2008 at 01:38, Charles R Harris
<charlesr.harris at gmail.com> wrote:
>
> On Sat, Dec 27, 2008 at 10:27 PM, David Cournapeau
> <david at ar.media.kyoto-u.ac.jp> wrote:
>>
>> Hi,
>>
>>    While looking at the last failures of numpy trunk on windows for
>> python 2.5 and 2.6, I got into floating point number formatting issues;
>> I got deeper and deeper, and now I am lost. We have several problems:
>>    - we are not consistent between platforms, nor are we consistent
>> with python
>>    - str(np.float32(a)) is locale dependent, but python str method is
>> not (locale.str is)
>>    - formatting of long double does not work on windows because of the
>> broken long double support in mingw.
>>
>> 1 consistency problem:
>> ----------------------
>>
>> python -c "a = 1e20; print a" -> 1e+020
>> python26 -c "a = 1e20; print a" -> 1e+20
>>
>> In numpy, we use PyOS_snprintf for formatting, but python itself uses
>> PyOS_ascii_formatd - which has different behavior on different versions
>> of python. The above behavior can be simply reproduced in C:
>>
>> #include <Python.h>
>>
>> int main()
>> {
>>    double x = 1e20;
>>    char c[200];
>>
>>    PyOS_ascii_format(c, sizeof(c), "%.12g", x);
>>    printf("%s\n", c);
>>    printf("%g\n", x);
>>
>>    return 0;
>> }
>>
>> On 2.5, this will print:
>>
>> 1e+020
>> 1e+020
>>
>> But on 2.6, this will print:
>>
>> 1e+20
>> 1e+020
>>
>> 2 locale dependency:
>> --------------------
>>
>> Another issue is that our own formatting is local dependent, whereas
>> python isn't:
>>
>> import numpy as np
>> import locale
>> locale.setlocale(locale.LC_NUMERIC, 'fr_FR')
>> a = 1.2
>>
>> print "str(a)", str(a)
>> print "locale.str(a)", locale.str(a)
>> print "str(np.float32(a))", str(np.float32(a))
>> print "locale.str(np.float32(a))", locale.str(np.float32(a))
>>
>> Returns:
>>
>> str(a) 1.2
>> locale.str(a) 1,2
>> str(np.float32(a)) 1,2
>> locale.str(np.float32(a)) 1,20000004768
>>
>> I thought about copying the way python does the formatting in the trunk
>> (where discrepancies between platforms have been fixed), but this is not
>> so easy, because it uses a lot of code from different places - and the
>> code needs to be adapted to float and long double. The other solution
>> would be to do our own formatting, but this does not sound easy:
>> formatting in C is hard. I am not sure about what we should do, if
>> anyone else has any idea ?
>
> I think the first thing to do is make a decision on locale. If we chose to
> support locales I don't see much choice but to depend Python because it's
> too much work otherwise, and work not directly related to Numpy at that. If
> we decide not to support locales then we can do our own formatting if we
> need to using a fixed choice of locale. There is a list of snprintf
> implementations here. Trio looks like a mature project and has an MIT
> license, which I think is a license compatible with Numpy.

We should not support locales. The string representations of these
elements should be Python-parseable.

> I'm inclined to just fix the locale and ignore the rest until Python gets
> things sorted out. But I'm lazy...

What do you think Python doesn't have sorted out?

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco