--with-fpectl changes the CPython ABI

Hi all, Well, we finally got that ucs2/ucs4 stuff all sorted out (yay), but I just learned that there's another CPython build flag that also changes the ABI: --with-fpectl Specifically, it seems that if you build CPython with --with-fpectl, and then use that CPython to build an extension module, and that extension module uses PyFPE_{START,END}_PROTECT (like e.g. Cython modules do), and you then try to import that extension module on a CPython that *wasn't* built with --with-fpectl, then it will crash. This bug report has more gory details: https://github.com/numpy/numpy/issues/8415 The reverse is OK -- extensions built in a no-fpectl CPython can be imported by both no-fpectl and yes-fpectl CPythons. So one consequence is easy -- we need to make a note in the manylinux1 definition saying that you have to use a no-fpectl CPython build to make manylinux1 wheels, because that's the only way to guarantee compatibility. I just submitted a PR for this: https://github.com/python/peps/pull/166 (Fortunately the manylinux1 docker images are already set up this way, so in practice I think everyone is already doing this.) But... in general this is kind of an unfortunate issue, and it's not restricted to Linux. Should we do something? Some options: Add another ABI flag -- e.g. cp35dmf vs. cp35dm? Though AFAICT the offending macros are actually part of the allegedly stable ABI (!!), so I guess this isn't really a solution. (I'm not 100% confident about how to tell whether something is part of the stable ABI, but: Python.h unconditionally imports pyfpe.h, and pyfpe.h doesn't have any Py_LIMITED_API checks.) Drop support for fpectl entirely in 3.7 on the grounds that it's not worth the trouble? (The docs have said "don't use this" at the top forever[1], and after clicking through every hit on github code search for language = Python and string "turnon_sigfpe" [2], I found exactly 4 non-documentation usages [3], all of which are already broken in no-fpectl builds.) We obviously can't do this in a point release though, because there are lots of extant extension modules referencing these symbols. Or maybe make it so that even no-fpectl builds still export the necessary symbols so that yes-fpectl extensions don't crash on import? (This has the advantage that it can be done in a point release...) Thoughts? -n [1] https://docs.python.org/2/library/fpectl.html [2] https://github.com/search?l=Python&p=1&q=turnon_sigfpe&type=Code&utf8=%E2%9C%93 [3] https://github.com/podhrmic/JSBSim/blob/36de9ac63c959cef5d7b2c56fb49c1a57fd4... https://github.com/tmbdev/iuprlab/blob/239918b5ec0f8deecbc7c2ec1283a837d11a7... https://github.com/wcs211/BEM-3D-Python/blob/874aaeffc3dac5f698f875478c3accf... https://github.com/neobonzi/SoundPlagiarism/blob/7cff7f0145217bffb3a3cebd59a... -- Nathaniel J. Smith -- https://vorpus.org

On 25 December 2016 at 09:48, Nathaniel Smith <njs@pobox.com> wrote:
This seems like a sensible thing to do in 3.6, 3.5 and 2.7 regardless of what happens in 3.7. For 3.7, I don't understand the trade-offs well enough to have a strong opinion, but dropping the feature entirely does seem reasonable - folks that want fine-grained floating point exception control these days are likely to be much better served by the decimal module, or one of the third party computing libraries (numpy, gmpy, sympy, etc). There was a thread back in 2012 [1] regarding the possibility of instead updating floats to offer flexibility similar to that offered by those other modules, but I think our discussions of the expected semantics of a decimal literal show that that would be a bad idea - context dependent behaviour in numeric literals creates all sorts of problems at the level of compiler and interpreter design. Cheers, Nick. [1] https://mail.python.org/pipermail/python-ideas/2012-October/016768.html -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sun, Dec 25, 2016 at 5:55 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
I looked into this a bit more. I think the way it's *supposed* to work is that normally, various operations in Python might return inf or nan, but if you call fpectl.turnon_sigfpe() then they switch to raising exceptions instead. But AFAICT the fpectl module: 1) is totally broken on major platforms: There doesn't seem to be any implementation at all for MacOS. On x86/x86-64 Linux it works by fiddling with the x87 control word directly... which is okay for traditional x86 with SSE disabled, but on x86-64, or x86 with SSE enabled, then there are two separate floating point units on the processor (the old x87 FPU, and the new SSE unit), and which one gets used for any given operation is up to the C compiler. So on Linux, whether fpectl actually affects any given floating point operation is dependent on basically the phase of the moon. This is pretty bad. 2) doesn't seem to actually accomplish anything even when it does work: Back in the old days, math.exp(1000) apparently returned inf (there's a REPL transcript showing this at the top of the fpectl documentation). But nowadays math.exp raises an exception in cases where it used to return inf, regardless of fpectl. I haven't been able to find any cases where fpectl actually... does anything? 3) ...except that it does break numpy and any other code that expects the default IEEE-754 semantics: The way fpectl works is that it twiddles with the FP control word, which is a thread-global variable. After you call turnon_sigfpe(), then *any* floating point code in that thread that happens to generate a nan or inf instead triggers a SIGFPE, and if the code isn't specifically written to use the PyFPE_* macros then this causes a process abort. For example: ~$ python Python 3.5.2+ (default, Dec 13 2016, 14:16:35) [GCC 6.2.1 20161124] on linux Type "help", "copyright", "credits" or "license" for more information.
Current thread 0x00007fea57a9f700 (most recent call first): File "<stdin>", line 1 in <module> zsh: abort python ~$ (I'm using np.longdouble to work around the Linux SSE bug -- using long double forces the calculations to be done on the x87 unit. On Windows I believe it would be sufficient to just do np.array(1.0) / np.array(0.0).) So I guess that yeah, my suggestion would be to drop this feature entirely in 3.7, given that it's never been enabled by default and has been largely broken for years. Or do we still need a full deprecation cycle? -n -- Nathaniel J. Smith -- https://vorpus.org

On 2 January 2017 at 18:27, Nathaniel Smith <njs@pobox.com> wrote:
I think the existing warning in the docs and the fact it's apparently been fundamentally broken for years is sufficient justification for just dropping it entirely. An explicit deprecation warning could be added in 3.6.1 and a Py3k warning in 2.7.x, though - those changes shouldn't be difficult, and it's a nice courtesy for anyone that *is* somehow currently getting it to work. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

I am happy to see it go. It was contributed many, many years ago by people (scientists from the early numpy world IIRC) who had a very specific use for it, but weren't really into maintaining it long-term, and I wasn't strong enough to refuse a well-meaning but poorly executed contribution at the time -- so we compromised on having the whole thing enabled through `#ifdef`. Clearly it started rotting the day I committed the code... On Mon, Jan 2, 2017 at 4:22 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
-- --Guido van Rossum (python.org/~guido)

Great, sounds like we have a plan: https://bugs.python.org/issue29137 On Mon, Jan 2, 2017 at 8:21 AM, Guido van Rossum <guido@python.org> wrote:
-- Nathaniel J. Smith -- https://vorpus.org

On 25 December 2016 at 09:48, Nathaniel Smith <njs@pobox.com> wrote:
This seems like a sensible thing to do in 3.6, 3.5 and 2.7 regardless of what happens in 3.7. For 3.7, I don't understand the trade-offs well enough to have a strong opinion, but dropping the feature entirely does seem reasonable - folks that want fine-grained floating point exception control these days are likely to be much better served by the decimal module, or one of the third party computing libraries (numpy, gmpy, sympy, etc). There was a thread back in 2012 [1] regarding the possibility of instead updating floats to offer flexibility similar to that offered by those other modules, but I think our discussions of the expected semantics of a decimal literal show that that would be a bad idea - context dependent behaviour in numeric literals creates all sorts of problems at the level of compiler and interpreter design. Cheers, Nick. [1] https://mail.python.org/pipermail/python-ideas/2012-October/016768.html -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sun, Dec 25, 2016 at 5:55 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
I looked into this a bit more. I think the way it's *supposed* to work is that normally, various operations in Python might return inf or nan, but if you call fpectl.turnon_sigfpe() then they switch to raising exceptions instead. But AFAICT the fpectl module: 1) is totally broken on major platforms: There doesn't seem to be any implementation at all for MacOS. On x86/x86-64 Linux it works by fiddling with the x87 control word directly... which is okay for traditional x86 with SSE disabled, but on x86-64, or x86 with SSE enabled, then there are two separate floating point units on the processor (the old x87 FPU, and the new SSE unit), and which one gets used for any given operation is up to the C compiler. So on Linux, whether fpectl actually affects any given floating point operation is dependent on basically the phase of the moon. This is pretty bad. 2) doesn't seem to actually accomplish anything even when it does work: Back in the old days, math.exp(1000) apparently returned inf (there's a REPL transcript showing this at the top of the fpectl documentation). But nowadays math.exp raises an exception in cases where it used to return inf, regardless of fpectl. I haven't been able to find any cases where fpectl actually... does anything? 3) ...except that it does break numpy and any other code that expects the default IEEE-754 semantics: The way fpectl works is that it twiddles with the FP control word, which is a thread-global variable. After you call turnon_sigfpe(), then *any* floating point code in that thread that happens to generate a nan or inf instead triggers a SIGFPE, and if the code isn't specifically written to use the PyFPE_* macros then this causes a process abort. For example: ~$ python Python 3.5.2+ (default, Dec 13 2016, 14:16:35) [GCC 6.2.1 20161124] on linux Type "help", "copyright", "credits" or "license" for more information.
Current thread 0x00007fea57a9f700 (most recent call first): File "<stdin>", line 1 in <module> zsh: abort python ~$ (I'm using np.longdouble to work around the Linux SSE bug -- using long double forces the calculations to be done on the x87 unit. On Windows I believe it would be sufficient to just do np.array(1.0) / np.array(0.0).) So I guess that yeah, my suggestion would be to drop this feature entirely in 3.7, given that it's never been enabled by default and has been largely broken for years. Or do we still need a full deprecation cycle? -n -- Nathaniel J. Smith -- https://vorpus.org

On 2 January 2017 at 18:27, Nathaniel Smith <njs@pobox.com> wrote:
I think the existing warning in the docs and the fact it's apparently been fundamentally broken for years is sufficient justification for just dropping it entirely. An explicit deprecation warning could be added in 3.6.1 and a Py3k warning in 2.7.x, though - those changes shouldn't be difficult, and it's a nice courtesy for anyone that *is* somehow currently getting it to work. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

I am happy to see it go. It was contributed many, many years ago by people (scientists from the early numpy world IIRC) who had a very specific use for it, but weren't really into maintaining it long-term, and I wasn't strong enough to refuse a well-meaning but poorly executed contribution at the time -- so we compromised on having the whole thing enabled through `#ifdef`. Clearly it started rotting the day I committed the code... On Mon, Jan 2, 2017 at 4:22 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
-- --Guido van Rossum (python.org/~guido)

Great, sounds like we have a plan: https://bugs.python.org/issue29137 On Mon, Jan 2, 2017 at 8:21 AM, Guido van Rossum <guido@python.org> wrote:
-- Nathaniel J. Smith -- https://vorpus.org
participants (3)
-
Guido van Rossum
-
Nathaniel Smith
-
Nick Coghlan