[Python-3000] Binary compatibility
talex5 at gmail.com
Mon Aug 6 21:33:18 CEST 2007
I recently asked about the UCS2 / UCS4 binary compatibility issues
with Python on Guido's blog, and Guido suggested I continue the
The issue is that Python has a compile-time configuration setting
which changes its ABI. For example, on Ubuntu we have:
$ objdump -T /usr/bin/python|grep UCS
080ac3e0 g DF .text 00000206 Base PyUnicodeUCS4_EncodeUTF8
080b2810 g DF .text 000000ba Base PyUnicodeUCS4_DecodeLatin1
080b6c20 g DF .text 000002b3 Base PyUnicodeUCS4_RSplit
Whereas on some other systems, including compiled-from-source Python, you get:
$ objdump -T python|grep UCS
080abc80 g DF .text 00000201 Base PyUnicodeUCS2_EncodeUTF8
080b32e0 g DF .text 000000c7 Base PyUnicodeUCS2_DecodeLatin1
080b6740 g DF .text 000002b9 Base PyUnicodeUCS2_RSplit
(note "UCS2" vs "UCS4")
This means that I can't distribute Python extensions as binaries. Any
extension built on Ubuntu may fail on some other system. I confess I
haven't tried this recently, but it has caused me trouble in the past.
I'd like to be sure it won't happen with Python 3.
I've hit this problem with both of the open source projects I work on;
the ROX desktop (http://rox.sf.net) and Zero Install
ROX is a desktop environment. Most of our programs are written in
(pure) Python. Some, including ROX-Filer, are pure C. Sometimes it
would have been useful to combine the two: for example we could write
the pager applet in Python if it could use C to talk to the libwnck
library, or we could add Python scripting to the filer and gradually
migrate more of the code to Python.
Zero Install is a decentralised software installation system, itself
written entirely in Python, in which software authors publish
GPG-signed XML feed files on their websites. These feeds list versions
of their programs along with a cryptographic digest of each version's
contents (think GIT tree IDs here). This allows installing software
without needing root access, while still sharing libraries and
programs automatically between (mutually suspicious) users. Although
we don't need to use C extensions for the system itself, distributing
Python/C hybrid programs with it has been problematic.
Another group having similar problems is the Autopackage project:
Finally, the issue has also been brought up before on the Python lists:
"Why don't you distribute a Python interpreter binary built with the
right options? Depending on users having installed the correct Python
version (especially if your users are not programmers) is asking for
There are several problems for us with this approach:
- We have to maintain our own version of Python, including pushing out
- We also have to maintain all the Python modules, in particular
python-gnome, in a similar way.
- Our users have to download Python twice whenever there's a new release.
- If some programs are using the distribution's Python and some are
using ours (libraries installed using Zero Install are only used by
software itself installed the same way; distribution packages aren't
affected), two copies of Python must be loaded into memory. This is
slow and wasteful of memory.
This is assuming all third-party code uses Zero Install for
distribution, so that only one extra version of Python is required.
For people distributing programs by other means, they would also have
to include their own copies of Python, leading to even more waste.
>From our point of view, it would be better if the format of strings
was an internal implementation detail. For most users, it doesn't
matter what the setting is, as long as the public interface doesn't
change! The cost of converting between formats is small, and in any
case most software outside of Python (the GNOME stack, for example)
uses UTF-8, so all strings have to be converted when going in or out
of Python anyway.
An alternative would be to default to UCS4, and give the option an
alarming name such as --with-unicode-for-space-limited-devices or
something so that packagers don't mess with it.
Dr Thomas Leonard http://rox.sourceforge.net
GPG: 9242 9807 C985 3C07 44A6 8B9A AE07 8280 59A5 3CC1
More information about the Python-3000