[Tony Meyer]
... (Glad you posted this - I was wading through the progress of marshalling (PyOS_snprintf etc) and getting rapidly lost).
It's the unmarshalling code that's relevant -- that just passes a string to atof().
1. When LC_NUMERIC is "german", MS C's atof() stops at the first period it sees.
This is the case: """ #include
#include #include int main() { float f; setlocale(LC_NUMERIC, "german"); f = atof("0.1"); printf("%f\n", f); } """
Gives me with gcc version 3.2 20020927 (prerelease): 0.100000
It's possible that glibc doesn't recognize "german" as a legitimate locale name (so that the setlocale() call had no effect).
Gives me with Microsoft C++ Builder (I don't have Visual C++ handy, but I suppose it would be the same): 0,00000
The help file for Builder does say that this is the correct behaviour - it will stop when it finds an unrecognised character - here '.' is unrecognised (because we are in German), so it stops.
atof does have to stop at the first unrecognized character, but atof is locale-dependent, so which characters are and aren't recognized depends on the locale. After I set locale to "german" on Win2K:
import locale locale.setlocale(locale.LC_NUMERIC, "german") 'German_Germany.1252'
MS tells me that the decimal_point character is ',' and the thousands_sep character is '.':
import pprint pprint.pprint(locale.localeconv()) {'currency_symbol': '', 'decimal_point': ',', HERE 'frac_digits': 127, 'grouping': [3, 0], 'int_curr_symbol': '', 'int_frac_digits': 127, 'mon_decimal_point': '', 'mon_grouping': [], 'mon_thousands_sep': '', 'n_cs_precedes': 127, 'n_sep_by_space': 127, 'n_sign_posn': 127, 'negative_sign': '', 'p_cs_precedes': 127, 'p_sep_by_space': 127, 'p_sign_posn': 127, 'positive_sign': '', 'thousands_sep': '.'} AND HERE
Python believes that the locale-specified thousands_sep character should be ignored, and that's what locale.atof() does. It may well be a bug in MS's atof() that it doesn't ignore the current thousands_sep character -- I don't have time now to look up the rules in the C standard, and it doesn't matter to spambayes either way (whether we load .001 as 0.0 as 1.0 is a disaster either way).
Does this then mean that this is a Python bug?
That Microsoft's atof() doesn't ignore the thousands_sep character is certainly not Pyton's bug <wink>.
Or because Python tells us not to change the c locale and we (Outlook) are, it's our fault/problem?
The way we're using Python with Outlook doesn't meet the documented requirements for using Python, so for now everything that goes wrong here is our problem. It would be better if Python didn't use locale-dependent string<->float conversions internally, but that's just not the case (yet).
Presumably what we'll have to do for a solution is just what Mark is doing now - find the correct place to put a call that (re)sets the c locale to English.
Python requires that the (true -- from the C library's POV) LC_NUMERIC category be "C" locale. That isn't English (although it looks a lot like it to Germans <wink>), and we don't care about any category other than LC_NUMERIC here.