RE: [Spambayes] Question (or possibly a bug report)
[copying to Python-Dev because it relates to a recent thread there too] [Meyer, Tony]
No doubt I should ask this on python-list not here, but since Tim would probably be the one to answer it anyway... :) Why does the setlocale() function give that impression? If LC_NUMERIC should always be "C", shouldn't setlocale(LC_NUMERIC, x) raise some sort of exception?
Reading the locale module docs should make it clearer. If it's still unclear after reading the docs, ask again <wink>. For a concrete example of why it's still useful to "pretend" to change LC_NUMERIC, see below (the locale module functions are sensitive to the change, and the code below couldn't be written otherwise).
Maybe Outlook is at fault here? I've certainly seen that some of the Outlook/COM/MAPI calls make changes to the locale. In particular, mapi.MAPILogonEx() does - it changes the locale to whatever Outlook (i.e. Windows) thinks it is. Could this then be screwing things up for us?
Oh yes. A .pyc file contains a compiled form of Python code objects. Part of what's in a .pyc file is the "marshal" form of numeric literals referenced by the source code. It so happens that marshal stores float literals as strings (repr-style). The unmarshaling code (executed when a .pyc file is loaded) is written in C, and uses C's atof() to convert these strings back to C doubles. atof() is locale-sensitive, so can screw up royally if LC_NUMERIC isn't "C" at the time a module is loaded. Here's a little program we can use to predict this kind of damage: """ import marshal, locale def damage(afloat, lcnumeric): s = marshal.dumps(afloat) print repr(afloat), "->", repr(s) # Now emulate unmarshaling that under a given locale. # Strip the type code and byte count. assert s[0] == 'f' raw = s[2:] print "Under locale %r that loads as" % lcnumeric, locale.setlocale(locale.LC_NUMERIC, lcnumeric) print repr(locale.atof(raw)) locale.setlocale(locale.LC_NUMERIC, "C") """ For example, running damage(0.001, "German") displays: """ 0.001 -> 'f\x050.001' Under locale 'German' that loads as 1.0 """ while damage(0.001, "C") displays what Python needs to happen instead: """ 0.001 -> 'f\x050.001' Under locale 'C' that loads as 0.001 """ So all kinds of bad things *can* happen. I'm still baffled by the spambayes logfile, though, because the failing assert is here: def set_stages(self, stages): self.stages = [] start_pos = 0.0 for name, prop in stages: stage = name, start_pos, prop start_pos += prop self.stages.append(stage) assert (abs(start_pos-1.0)) < 0.001, \ "Proportions must add to 1.0 (%g,%r)" % (start_pos, stages) and the failing call is here: self.set_stages( (("", 1.0),) ) Under a locale that ignores periods, string -> double 0.0 -> 0.0 1.0 -> 10.0 0.001 -> 1.0 So the assert above would act like assert (abs(start_pos-10.0)) < 1.0, \ and the call would act like self.set_stages( (("", 10.0),) ) start_pos - 10.0 would still be 0 then, and the assert should not fail.
participants (1)
-
Tim Peters