Be Honest about LC_NUMERIC (PEP proposal), was Re: [Python-Dev] LC_NUMERIC and C libraries
Christian Reis
kiko@async.com.br
Mon, 21 Jul 2003 11:10:05 -0300
On Sun, Jul 20, 2003 at 02:29:26PM -0400, Barry Warsaw wrote:
>=20
> Can we perhaps have a PEP for the 2.4 timeframe?
Sure. Reviews would be really appreciated.
PEP: XXX
Title: Be Honest about LC_NUMERIC (to the C library)
Version: $Revision: 1.9 $
Last-Modified: $Date: 2002/08/26 16:29:31 $
Author: Christian R. Reis <kiko at async.com.br>
Status: Draft
Type: Standards Track
Content-Type: text/plain <pep-xxxx.html>
Created: 19-July-2003
Post-History: =09
------------------------------------------------------------------------
Abstract
=20
Support in Python for the LC_NUMERIC locale category is currently
implemented only in Python-space, which causes inconsistent behavior
and thread-safety issues for applications that use extension modules
and libraries implemented in C. This document proposes a plan for
removing this inconsistency by providing and using substitute
locale-agnostic functions as necessary.
Introduction
Python currently provides generic localization services through the
locale module, which among other things allows localizing the
display and conversion process of numeric types. Locale categories,
such as LC_TIME and LC_COLLATE, allow configuring precisely what
aspects of the application are to be localized.
The LC_NUMERIC category specifies formatting for non-monetary
numeric information, such as the decimal separator in float and
fixed-precision numbers. Localization of the LC_NUMERIC category is
currently implemented in only in Python-space; the C libraries are
unaware of the application's LC_NUMERIC setting. This is done to
avoid changing the behavior of certain low-level functions that are
used by the Python parser and related code [2].
However, this presents a problem for extension modules that wrap C
libraries; applications that use these extension modules will
inconsistently display and convert numeric values.=20
=20
James Henstridge, the author of PyGTK [3], has additionally pointed
out that the setlocale() function also presents thread-safety
issues, since a thread may call the C library setlocale() outside of
the GIL, and cause Python to function incorrectly.
Rationale
The inconsistency between Python and C library localization for
LC_NUMERIC is a problem for any localized application using C
extensions. The exact nature of the problem will vary depending on
the application, but it will most likely occur when parsing or
formatting a numeric value.
Example Problem
=20
The initial problem that motivated this PEP is related to the
GtkSpinButton [4] widget in the GTK+ UI toolkit, wrapped by PyGTK.
The widget can be set to numeric mode, and when this occurs,
characters typed into it are evaluated as a number.=20
=20
Because LC_NUMERIC is not set in libc, float values are displayed
incorrectly, and it is impossible to enter values using the
localized decimal separator (for instance, `,' for the Brazilian
locale pt_BR). This small example demonstrates reduced usability
for localized applications using this toolkit when coded in Python.
Proposal
Martin V. L=F6wis commented on the initial constraints for an
acceptable solution to the problem on python-dev:
- LC_NUMERIC can be set at the C library level without breaking
the parser.
- float() and str() stay locale-unaware.
The following seems to be the current practice:
- locale-aware str() and float() [XXX: atof(), currently?]
stay in the locale module.
An analysis of the Python source suggests that the following
functions currently depend on LC_NUMERIC being set to the C locale:
- Python/compile.c:parsenumber()
- Python/marshal.c:r_object()
- Objects/complexobject.c:complex_to_buf()
- Objects/complexobject.c:complex_subtype_from_string()
- Objects/floatobject.c:PyFloat_FromString()
- Objects/floatobject.c:format_float()
- Modules/stropmodule.c:strop_atof()
- Modules/cPickle.c:load_float()
[XXX: still need to check if any other occurrences exist]
The proposed approach is to implement LC_NUMERIC-agnostic functions
for converting from (strtod()/atof()) and to (snprintf()) float
formats, using these functions where the formatting should not vary
according to the user-specified locale.=20
=20
This change should also solve the aforementioned thread-safety
problems.
Potential Code Contributions
This problem was initially reported as a problem in the GTK+
libraries [5]; since then it has been correctly diagnosed as an
inconsistency in Python's implementation. However, in a fortunate
coincidence, the glib library implements a number of
LC_NUMERIC-agnostic functions (for an example, see [6]) for reasons
similar to those presented in this paper. In the same GTK+ problem
report, Havoc Pennington has suggested that the glib authors would
be willing to contribute this code to the PSF, which would simplify
implementation of this PEP considerably.
[XXX: I believe the code is cross-platform, since glib in part was
devised to be cross-platform. Needs checking.]
[XXX: I will check if Alex Larsson is willing to sign the PSF
contributor agreement [7] to make sure the code is safe to
integrate.]
Risks
There may be cross-platform issues with the provided locale-agnostic
functions. This needs to be tested further.
Martin has pointed out potential copyright problems with the
contributed code. I believe we will have no problems in this area as
members of the GTK+ and glib teams have said they are fine with
relicensing the code.
Code
An implementation is being developed by Gustavo Carneiro=20
<gjc at inescporto.pt>. It is currently attached to Sourceforge.net
bug 744665 [8]
[XXX: The SF.net tracker is horrible 8(]
References
[1] PEP 1, PEP Purpose and Guidelines, Warsaw, Hylton
http://www.python.org/peps/pep-0001.html
[2] Python locale documentation for embedding,
http://www.python.org/doc/current/lib/embedding-locale.html
[3] PyGTK homepage, http://www.daa.com.au/~james/pygtk/
[4] GtkSpinButton screenshot (demonstrating problem),=20
http://www.async.com.br/~kiko/spin.png
[5] GNOME bug report, http://bugzilla.gnome.org/show_bug.cgi?id=3D114=
132
[6] Code submission of g_ascii_strtod and g_ascii_dtostr (later
renamed g_ascii_formatd) by Alex Larsson,=20
http://mail.gnome.org/archives/gtk-devel-list/2001-October/msg001=
14.html
[7] PSF Contributor Agreement,=20
http://www.python.org/psf/psf-contributor-agreement.html
[8] Python bug report, http://www.python.org/sf/774665
Copyright
This document has been placed in the public domain.
Take care,
--
Christian Reis, Senior Engineer, Async Open Source, Brazil.
http://async.com.br/~kiko/ | [+55 16] 261 2331 | NMFL