[Python-Dev] Re: Be Honest about LC_NUMERIC [REPOST]

Mon Sep 1 15:59:32 EDT 2003

On 31/08/2003 9:25 AM, Tim Peters wrote:

>[James Henstridge]
>  
>
>>>As Christian said, there is code in glib (not to be confused with
>>>glibc: the GNU C library) that could act as a basis for locale
>>>independent float conversion functions in Python.
>>>      
>>>
>
>[Martin v. Löwis]
>  
>
>>I very much doubt that this statement is true. Are you sure this code
>>supports all the platforms where Python runs? E.g. what about the
>>three (!) different floating point formats on a VAX?
>>    
>>
>
>Well, you should look at the patch:  it doesn't know anything about internal
>fp formats -- all conversions are performed in the end by calling the
>platform C's strtod() or snprintf().  What it does do is:
>
>1. For string to double, preprocess the input string to change it to
>   use current-locale spelling before calling the platform C strtod().
>
>2. For double to string, postprocess the result of the platform C
>   snprintf() to replace current-locale spelling with a "standard"
>   spelling.
>
>So this is much more string-munging code than it is floating-point code.
>Indeed, there doesn't appear to be a single floating-point operation in the
>entire patch (apart from calls to platform string<->float functions).
>
>OTOH, despite the claims, it doesn't look threadsafe to me:  there's no
>guarantee, e.g., that the idea of current locale g_ascii_strtod() obtains
>from localeconv() at its start is still in effect by the time
>g_ascii_strtod() gets around to calling strtod().
>
This is true.  However, in practice we found it fixed a number of thread 
safety issues in programs.

Your average localised package usually switches to the user's preferred 
locale on startup, so that it can display strings and messages, and 
occasionally wants to read/write numbers in a locale independent format 
(usually when saving/loading files).  The most common way of doing this 
is the setlocale/strtod/setlocale combo, which has thread safety 
problems and possible reentrancy problems if done wrong.

The method used by g_ascii_strotod() removes the need to switch locale 
when parsing the float, which means that an application using it may 
only need to call setlocale() once on startup and never again.  This 
seems to be the best way to use setlocale w.r.t. thread safety.

The existing locale handling in Python shares this property, but makes 
it difficult for external libraries to format and parse floats in the 
locale's representation.  From what I can see, leaving LC_NUMERIC set to 
the locale value rather than "C" leads to better interoperability.

>  So at best it solves part
>of one relevant problem here (other relevant problems include that platform
>C libraries disagree about how to spell infinities, NaNs and signed zeroes;
>about how many digits to use for an exponent; and about how to round results
>(for example,
>
>  
>
>>>>"%.1f" % 2.25
>>>>        
>>>>
>'2.3'
>  
>
>
>on Windows, but most (not all!) flavors of Unix produce the IEEE-754
>to-nearest/even rounded '2.2' instead)).
>
>It's easy to write portable, perfectly-rounding string<->double conversion
>routines without calling any platform functions.  The rub is that "fast"
>goes out the window then, unless you give up at least one of {portable,
>accurate}.
>  
>
It would be great for Python to have consistent float parsing/formatting 
on every platform in the future.  Making sure that every place where 
Python wants to parse or format a float in a locale independent fashion 
go through a single set of functions should make it easier to drop in a 
new set of routines in the future.  However, getting rid of the 
LC_NUMERIC=C requirement would have real benefits today.

James.

-- 
Email: james at daa.com.au
WWW:   http://www.daa.com.au/~james/