[spambayes-dev] RE: [Python-Dev] RE: [Spambayes] Question (orpossibly a bug report)

Tim Peters tim.one@comcast.net
Sun, 27 Jul 2003 16:54:42 -0400


[Mark Hammond]
> This seems to be coming to a conclusion.  Not a completely
> satisfactory one, but one nonetheless.
>
> Short story for the python-dev crew:
>
> * Some Windows programs are known to run with the CRT locale set to
>   other than "C" - specifically, set to the locale of the user.
> * If this happens, the marshal module will load floating point
>   literals incorrectly.

Well, it depends on the locale, and on the fp literals in question, but it's
often the case that damage occurs.

> * Thus, once this happens, if a .pyc file is imported, the floating
>   point literals in that .pyc are wrong.  Confusion reigns.

Yup -- and it's an excellent to-the-point summary!

> The "best" solution to this probably involves removing Python being
> dependent on the locale - there is even an existing patch for that.

Kinda.

> To the SpamBayes specifics:
> ...
> I have a version working for the original bug reporter.  While on our
> machines, we can reproduce the locale being switched at MAPILogon
> time, my instrumented version also shows that for some people at
> least, Outlook itself will also change it back some time before
> delivering UI events to us.

There's potentially another dark side to this story:  if MS code is going
out of its way to switch locale, it's presumably because some of MS's code
wants to change the behavior of CRT routines to work "as expected" for the
current user.  So if we switch LC_NUMERIC back to "C", we *may* be creating
problems for Outlook.  I'll never stumble into this, since "C" locale and my
normal locale are so similar (and have identical behavior in the LC_NUMERIC
category).  At least Win32's *native* notions of locale are  settable on a
per-thread basis; C's notion is a global hammer; it's unclear to me why MS's
code is even touching C's notion.

> ...
> We *do* still have the "social" problem of what locale conventions to
> use for Config files, but that has nothing to do with our tools...

To the extent that Config files use Python syntax, they're least surprising
if they stick to Python syntax.  The locale pit is deep.  For example,
Finnish uses a non-breaking space to separate thousands "although fullstop
may be used in monetary context".  We'll end up with more code to cater to
gratuitous locale differences than to identify spam <0.7 wink>.