special compiler options for only one file

Hi all, I have a Python extension which uses CPU-specific features, if available. This is done through a run-time check. If the hardware supports the POPCNT instruction then it selects one implementation of my inner loop, if SSSE3 is available then it selects another, otherwise it falls back to generic versions of my performance critical kernel. (Some 95%+ of the time is spent in this kernel.) Unfortunately, there's a failure mode I didn't expect. I use -mssse3 and -O3 to compile all of the C code, even though only one file needs that -mssse3 option. As a result, the other files are compiled with the expectation that SSSE3 will exist. This causes a segfault for the line start_target_popcount = (int)(query_popcount * threshold); because the compiler used fisttpl, which is an SSSE-3 instruction. After all, I told it to assume that ssse3 exists. The Debian packager for my package recently ran into this problem, because the test machine has a gcc which understands -mssse3 but the machine itself has an older CPU without those instructions. I'm trying to come up with a solution that can be automated for the Debian distribution. I want a solution where the same binary can work on older machines and on newer ones Ideally I would like to say that only one file is compiled with the -mssse3 option. Since my selector code isn't part of this file, SSSE-3 code will never be executed unless the CPU supports is. However, I can't figure out any way to tell distutils that a set of compiler options are specific to a single file. Is that even possible? Cheers, Andrew dalke@dalkescientific.com

I guess utilize build_clib command to create static library with your settings just for your file than build your extensions with linking that library. On Tue, Mar 19, 2013 at 4:54 PM, Andrew Dalke <dalke@dalkescientific.com> wrote:
Hi all,
I have a Python extension which uses CPU-specific features, if available. This is done through a run-time check. If the hardware supports the POPCNT instruction then it selects one implementation of my inner loop, if SSSE3 is available then it selects another, otherwise it falls back to generic versions of my performance critical kernel. (Some 95%+ of the time is spent in this kernel.)
Unfortunately, there's a failure mode I didn't expect. I use -mssse3 and -O3 to compile all of the C code, even though only one file needs that -mssse3 option.
As a result, the other files are compiled with the expectation that SSSE3 will exist. This causes a segfault for the line
start_target_popcount = (int)(query_popcount * threshold);
because the compiler used fisttpl, which is an SSSE-3 instruction. After all, I told it to assume that ssse3 exists.
The Debian packager for my package recently ran into this problem, because the test machine has a gcc which understands -mssse3 but the machine itself has an older CPU without those instructions.
I'm trying to come up with a solution that can be automated for the Debian distribution. I want a solution where the same binary can work on older machines and on newer ones
Ideally I would like to say that only one file is compiled with the -mssse3 option. Since my selector code isn't part of this file, SSSE-3 code will never be executed unless the CPU supports is.
However, I can't figure out any way to tell distutils that a set of compiler options are specific to a single file.
Is that even possible?
Cheers,
Andrew dalke@dalkescientific.com
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
-- Thanks, Andrew Svetlov

On Tue, Mar 19, 2013 at 7:54 PM, Andrew Dalke <dalke@dalkescientific.com> wrote:
Hi all,
I have a Python extension which uses CPU-specific features, if available. This is done through a run-time check. If the hardware supports the POPCNT instruction then it selects one implementation of my inner loop, if SSSE3 is available then it selects another, otherwise it falls back to generic versions of my performance critical kernel. (Some 95%+ of the time is spent in this kernel.)
Unfortunately, there's a failure mode I didn't expect. I use -mssse3 and -O3 to compile all of the C code, even though only one file needs that -mssse3 option.
As a result, the other files are compiled with the expectation that SSSE3 will exist. This causes a segfault for the line
start_target_popcount = (int)(query_popcount * threshold);
because the compiler used fisttpl, which is an SSSE-3 instruction. After all, I told it to assume that ssse3 exists.
The Debian packager for my package recently ran into this problem, because the test machine has a gcc which understands -mssse3 but the machine itself has an older CPU without those instructions.
I'm trying to come up with a solution that can be automated for the Debian distribution. I want a solution where the same binary can work on older machines and on newer ones
Ideally I would like to say that only one file is compiled with the -mssse3 option. Since my selector code isn't part of this file, SSSE-3 code will never be executed unless the CPU supports is.
However, I can't figure out any way to tell distutils that a set of compiler options are specific to a single file.
Is that even possible?
One possible solution, albeit more complex code-wise than compiling that single file as a static lib, would be to subclass the compiler class you want to use and override its _compile() method, which is the one responsible for compiling a single file. You can then override the arguments in there. Look, for example, at distutils.unixcompiler.UnixCCompiler._compile. If you want to support different compiler implementations, you could even detect at runtime which compiler was selected with the --compiler option and subclass the appropriate implementation. Also getting distutils to accept a compiler subclass requires a little bit of hackery, but it's not undoable. If that optioni sounds viable to you I can delve into more details about how to actually do it, as I've had to do this sort of thing before myself. So yes, it can be done. Erik

Hi Erik, On Apr 11, 2013, at 7:43 PM, Erik Bray wrote:
One possible solution, albeit more complex code-wise than compiling that single file as a static lib, would be to subclass the compiler class you want to use and override its _compile() method, which is the one responsible for compiling a single file. You can then override the arguments in there. Look, for example, at distutils.unixcompiler.UnixCCompiler._compile.
I shudder every time I think about subclassing one of the compiler classes. I've done it before, based on suggestions from this list, but I had no sense of how it actually worked. Instead, I wrote a wrapper script, at the end of this email, which can be used like this: env CC=$PWD/filter_gcc python setup.py build It sniffs the command-line and keeps "-mssse3" for the right file, otherwise it removes that option. It then calls gcc with the tweaked arguments. Unlike subclassing, I understand how this one works. :) Cheers, Andrew dalke@dalkescientific.com #!/usr/bin/env python # chemfp by default assumes the --with-ssse3 flag, which enables # compiler-specific option to enable the SSSE3 intrinsics. # # You can disable this using --without-ssse3. # # The code which requires the SSSE3 specific intrinsics will only be # run when chemfp, at run-time, determines that the CPU supports the # SSSE3 instruction set. This is as it should be. Unfortunately, the # -mssse3 compiler flag also tells gcc that it's okay to use # SSSE3-specific instructions in the rest of the code. This causes a # bus error if chemfp is then used on a CPU which doesn't support the # correct instruction set. Unfortunately, I don't have the ability to # use a non-ssse3 code path for this case. # # This is only a problem if: # - you use the same binary on multiple platforms, where # - some machines do not have the SSSE3 instruction set AND # - some machines have the SSSE3 instruction set AND # - the machines with the SSSE3 instruction set do not support POPCNT. # # (If all of your SSSE3 machines also support POPCNT then it's okay to # use --without-ssse3, because the POPCNT instruction is always faster # than the SSSE3-based popcount implementation.) # # This script, filter_gcc, is a workaround for the problem. Only one # file needs the -mssse3 option. The best solution is to tell Python's # setup.py to compile src/popcount_SSSE3.c with -mssse3 enabled and to # leave out that flag for the other files. Unfortunately, setup.py # doesn't make that easy. # # filter_gcc acts as a filter layer between setup.py and gcc. It's # used like this: # # env CC=$PWD/filter_gcc python setup.py build # # This tells setup.py to use $PWD/filter_gcc (ie, this script) as an # alternate C compiler. This script run and checks if setup.py is # attempting to compile popcount_SSSE3.c. If so, it leaves the -mssse3 # flag in place (if it exists). Otherwise, it removes the flag (if it # exists). import sys import subprocess CC = "gcc" # The real C compiler args = sys.argv[1:] #print "Called with", args # Check to see if I should remove the "-mssse3" flag from the args. remove_mssse3 = True for arg in args: if "popcount_SSSE3.c" in arg: remove_mssse3 = False break if remove_mssse3: # Go ahead and remove the "-mssse3" try: args.remove("-mssse3") except ValueError: # This can happen if someone does: # env CC=$PWD/filter_gcc python setup.py build --without-ssse3 pass assert args, "Missing exec args" # Use the correct C compiler args = [CC] + args #print " -->", " ".join(args) # Run the new command, and report any errors. try: retcode = subprocess.call(args) except OSError, err: cmd = " ".join(args) raise SystemExit("Failed to execute %r: %s" % (cmd, err)) if retcode: raise SystemExit(retched)
participants (3)
-
Andrew Dalke
-
Andrew Svetlov
-
Erik Bray