[Python-Dev] Be Honest about LC_NUMERIC [REPOST]
kiko at async.com.br
Thu Aug 14 00:21:28 EDT 2003
So, in an attempt to garner comments (now that we have 2.3 off the
chopping block) I'm reposting my PEP proposal (with minor updates).
Comments would be appreciated, of course (nudges Barry slightly after
him getting me to write this on my only free Sunday in months ;)
Title: Be Honest about LC_NUMERIC (to the C library)
Version: $Revision: 1.9 $
Last-Modified: $Date: 2002/08/26 16:29:31 $
Author: Christian R. Reis <kiko at async.com.br>
Type: Standards Track
Content-Type: text/plain <pep-xxxx.html>
Support in Python for the LC_NUMERIC locale category is currently
implemented only in Python-space, which causes inconsistent behavior
and thread-safety issues for applications that use extension modules
and libraries implemented in C. This document proposes a plan for
removing this inconsistency by providing and using substitute
locale-agnostic functions as necessary.
Python currently provides generic localization services through the
locale module, which among other things allows localizing the
display and conversion process of numeric types. Locale categories,
such as LC_TIME and LC_COLLATE, allow configuring precisely what
aspects of the application are to be localized.
The LC_NUMERIC category specifies formatting for non-monetary
numeric information, such as the decimal separator in float and
fixed-precision numbers. Localization of the LC_NUMERIC category is
currently implemented in only in Python-space; the C libraries are
unaware of the application's LC_NUMERIC setting. This is done to
avoid changing the behavior of certain low-level functions that are
used by the Python parser and related code .
However, this presents a problem for extension modules that wrap C
libraries; applications that use these extension modules will
inconsistently display and convert numeric values.
James Henstridge, the author of PyGTK , has additionally pointed
out that the setlocale() function also presents thread-safety
issues, since a thread may call the C library setlocale() outside of
the GIL, and cause Python to function incorrectly.
The inconsistency between Python and C library localization for
LC_NUMERIC is a problem for any localized application using C
extensions. The exact nature of the problem will vary depending on
the application, but it will most likely occur when parsing or
formatting a numeric value.
The initial problem that motivated this PEP is related to the
GtkSpinButton  widget in the GTK+ UI toolkit, wrapped by PyGTK.
The widget can be set to numeric mode, and when this occurs,
characters typed into it are evaluated as a number.
Because LC_NUMERIC is not set in libc, float values are displayed
incorrectly, and it is impossible to enter values using the
localized decimal separator (for instance, `,' for the Brazilian
locale pt_BR). This small example demonstrates reduced usability
for localized applications using this toolkit when coded in Python.
Martin v. Löwis commented on the initial constraints for an
acceptable solution to the problem on python-dev:
- LC_NUMERIC can be set at the C library level without breaking
- float() and str() stay locale-unaware.
The following seems to be the current practice:
- locale-aware str() and float() [XXX: atof(), currently?]
stay in the locale module.
An analysis of the Python source suggests that the following
functions currently depend on LC_NUMERIC being set to the C locale:
[XXX: still need to check if any other occurrences exist]
The proposed approach is to implement LC_NUMERIC-agnostic functions
for converting from (strtod()/atof()) and to (snprintf()) float
formats, using these functions where the formatting should not vary
according to the user-specified locale.
This change should also solve the aforementioned thread-safety
Potential Code Contributions
This problem was initially reported as a problem in the GTK+
libraries ; since then it has been correctly diagnosed as an
inconsistency in Python's implementation. However, in a fortunate
coincidence, the glib library implements a number of
LC_NUMERIC-agnostic functions (for an example, see ) for reasons
similar to those presented in this paper. In the same GTK+ problem
report, Havoc Pennington has suggested that the glib authors would
be willing to contribute this code to the PSF, which would simplify
implementation of this PEP considerably.
[I'm checking if Alex Larsson is willing to sign the PSF
contributor agreement  to make sure the code is safe to
integrate; XXX: what would be necessary to sign here?]
There may be cross-platform issues with the provided locale-agnostic
functions. This needs to be tested further.
Martin has pointed out potential copyright problems with the
contributed code. I believe we will have no problems in this area as
members of the GTK+ and glib teams have said they are fine with
relicensing the code.
An implementation is being developed by Gustavo Carneiro
<gjc at inescporto.pt>. It is currently attached to Sourceforge.net
bug 744665 
[XXX: The SF.net tracker is horrible 8(]
 PEP 1, PEP Purpose and Guidelines, Warsaw, Hylton
 Python locale documentation for embedding,
 PyGTK homepage, http://www.daa.com.au/~james/pygtk/
 GtkSpinButton screenshot (demonstrating problem),
 GNOME bug report, http://bugzilla.gnome.org/show_bug.cgi?id=114132
 Code submission of g_ascii_strtod and g_ascii_dtostr (later
renamed g_ascii_formatd) by Alex Larsson,
 PSF Contributor Agreement,
 Python bug report, http://www.python.org/sf/774665
This document has been placed in the public domain.
Christian Reis, Senior Engineer, Async Open Source, Brazil.
http://async.com.br/~kiko/ | [+55 16] 261 2331 | NMFL
More information about the Python-Dev