[Distutils] A change to how platform name is generated for binary distributions?

Tue Mar 31 23:13:23 CEST 2009

We're starting to try and distribute some pre-built binaries on Solaris 
and have come across an issue with how pkg_resources / distutils 
generates the platform specification when the distribution includes 
Python extensions. 

In particular, we'd like to distribute both x86 and amd64 (or x86_64) 
binaries for Solaris on Intel since the OS can run in either mode, but 
all our egg distributions get the same file name no matter what 
architecture they're actually built for.  Diving into how the platform 
part of a distribution name gets generated, I find that it relies on the 
results of 'os.uname' via distutils.util.get_platform().    This seems 
like it would cause problems on more than just Solaris as it should also 
happen anytime someone is using a 32-bit Python on a 64-bit OS/CPU, no?

So it occurs to me that it might be nice to fix up 
distutils.util.get_platform() to try and rely on the information that 
can be obtained about the sys.executable itself.  After all, distutils / 
setuptools build extensions using the same compiler and linker flags 
that the Python environment was generated with so it seems logical to 
tie them together here too.  While it may be hard (impossible?) to cover 
enough bases of reading compiler flags, many OSes provide tools to 
gather metadata about executable files which are fairly easy to use.  In 
fact, on OSes that use ELF binaries, there should be alot of commonality 
in figuring this out I think.

So I'm proposing that a change be made to do the above by wrapping all 
places where distutils.util.get_platform returns a value with something 
that replaces the last part of the platform spec with the Python 
architecture *if* it can safely be determine what that architecture 
is.   Something like the following pseudo-code wrapper:

def wrapper(platform):
    if cached_arch is None:
       cached_arch = _get_python_arch()
       parts = platform.split('-')
       parts[-1] = cached_arch
       platform = '-'.join(parts)
    return platform

def _get_python_arch():
    if sys.platform == '<KNOWN OS>':
       return method_for_finding_python_arch_on_known_os()
    ...
    else:
       return ''

I'm much more familiar with setuptools / pkg_resources than I am with 
distutils, so I've actually tried this out with an implementation where 
pkg_resources.get_build_platform() is patched and the 
_get_python_arch()  above only returns something for Solaris.   Seems to 
work okay for me, but that's likely more because I don't see many 
Solaris eggs in the wild than anything I've done explicitly for 
backwards compatibility.  I understand backwards compatibility would be 
a big sticking point here, so I've tried to structure things where 
values only change where the implementation is positive it knows what to 
do, and what values to return.

Thoughts?  If there is agreement, I can submit a patch.

-- Dave

P.S. It is my understanding that there is no equivalent to OS X's 'fat' 
dual-binary mode on Solaris for shipping multiple arch binaries.  Nor 
can I find any difference in filenames or directory names between a 
32-bit and 64-bit Python 2.5.4 built from source.  Is there some other 
alternative to building that would solve the above problem without 
patching distutils/pkg_resources?

P.P.S.  While you can force the kernel to boot in 32-bit mode, 
distutils.util.get_platform() returns the same values because the OS's 
uname does.  i.e. whether 64-bit or 32-bit Intel, uname returns "i86pc" 
as the machine.