[Python-porting] [RELEASED] six 1.1

Wed Nov 23 21:59:15 CET 2011

On Nov 23, 2011, at 10:55 AM, Benjamin Peterson wrote:

>2011/11/23 Barry Warsaw <barry at python.org>:
>> both the Python and C levels.  I'll write up the details (e.g. __next__()
>> vs. next()) hopefully today, I've also found a few more traps and tricks for
>> extension modules.  I wonder if you have any interest in adding some C level
>> portability helpers.
>
>You mean like a header file with macros for PyInt -> PyLong/PyString
>-> PyUnicode etc?

There are a bunch of little things I've found helpful while porting
dbus-python.  I think some at least would be generally useful for extension
modules.  Here's a quick summary (so far :).  I should note first that I only
care about Python 2.6, 2.7, and 3.2.  I think there was only one case where
2.6 didn't have what I needed.

I tried to reduce the number of #ifdefs in the code by converting some things
that can be made common between the two versions.

 - In Python 2, I always #include <bytesobject.h> and unilaterally change all
   PyString names to PyBytes names.  That reduces a lot of the ugliness.

 - I changed all the reprs to return unicodes in both Python versions instead
   of conditionally continuing to return strings in Python 2.  That reduced
   another source of noise, but I had to use a little trick with
   PyUnicode_FromFormat().  The reprs in this package embed the repr of the
   parent class, but you don't know whether that will be a bytes (under Python
   2) or a unicode (under Python 3).  It was fairly ugly to ifdef around this,
   so instead of using either the %s or %U codes wrapped in macros, I use the
   %V code.  Now, I'm not sure if that was added for this purpose, but it sure
   is handy.  The call sites look something like this now:

   PyObject *parent_repr = (<baseclass>.tp_repr)(self);
   PyObject *my_repr = PyUnicode_FromFormat("...%V...", REPRV(parent_repr));

   and the macro looks like this:

   #define REPRV(obj) \
       (PyUnicode_Check(obj) ? (obj) : NULL), \
       (PyUnicode_Check(obj) ? NULL : PyBytes_AS_STRING(obj))

   I supposed technically this could crash if the parent repr (erroneously)
   returned a non-string, but in my case, that won't happen because the base
   classes are standard Python types, or otherwise well-controlled.

Additional compatibility macros and functions:

 - I really dislike writing "#if PY_MAJOR_VERSION >= 3" all over the place, so
   I define the following macro to make the version test easier:

   #if PY_MAJOR_VERSION >= 3
   #define PY3K
   #endif

   Now all I need are "#ifdef PY3K" sprinkles.  Okay, maybe it's a minor
   savings, but I've found it helpful.

 - dbus defines subclasses of PyInts and PyLongs.  When porting to Python 3,
   all of these have to become subclasses of PyLongs, however for some of
   them, the exact hierarchy doesn't matter so much, so I've switched them to
   use PyLongObjects.

   Python 3.0 had a <intobject.h> compatibility header which I think would
   have been nice, but that's gone in Python 3.2.

   In Python 2, PyLongObject isn't defined unless you also #include
   <longintrepr.h>.  <Python.h> isn't enough.

 - The extension module interns a couple of strings.  In Python 2 this is
   PyString_InternFromString while in Python 3 it's
   PyUnicode_InternFromString.  I have the following macro for this:

   #ifdef PY3K
   #define INTERN PyUnicode_InternFromString
   #else
   #define INTERN PyString_InternFromString
   #endif

 - There are several places where PyArg_Parse*() wants to get a char*.
   Under Python 2, these just provide "s" codes and get passed a PyString.
   Under Python 3, I decided to allow either a bytes object or a utf-8 encoded
   unicode, but I always want to coerce it to a bytes internally, making it
   easy to extract the char*.

   I decided to switch the "s" codes to O& codes and add the following
   converter function:

   #ifdef PY3K
   #define RETURN_CLEANUP Py_CLEANUP_SUPPORTED
   #else
   #define RETURN_CLEANUP 1
   #endif

   int
   dbus_parse_bytes(PyObject *object, void *address)
   {
       PyObject *bytes;
       Py_ssize_t size;
       void *data;

       if (!object) {
           /* This is Python having a parse error, so free our reference. */
           Py_CLEAR(*(PyObject **)address);
           return 1;
       }
       if (PyBytes_Check(object)) {
           bytes = object;
           Py_INCREF(bytes);
       }
       else {
           if (!(bytes = PyUnicode_AsUTF8String(object)))
               return 0;
       }
       /* Embedded NULs are not allowed in dbus. */
       size = PyBytes_GET_SIZE(bytes);
       data = PyBytes_AS_STRING(bytes);
       if (size != (Py_ssize_t)strlen(data)) {
           PyErr_SetString(PyExc_TypeError, "embedded NUL character");
           Py_DECREF(bytes);
           return 0;
       }
       *(PyObject**)address = bytes;
       return RETURN_CLEANUP;
   }

   I think there's a potential for leaking these args under Python 2 when
   subsequent parse codes fail, because Py_CLEANUP_SUPPORTED isn't defined.
   I'm not sure there's anything that can be done about it, so hopefully it's
   rare enough not to matter in practice.

Things I haven't macro'd around:

 - PyCapsule vs PyCObject; I just #ifdef around the whole block of code.

 - A number of places want to check if something's a PyInt or a PyLong.  The
   PyInt checks can't be performed under Python 3, so I have some rather ugly
   #ifdefs sprinkled in various conditional.  (I suppose I could no-op
   PyInt_Check under Python 3).

 - Py_TPFLAGS_HAVE_WEAKREFS doesn't exist in Python 3 so I have to ifdef
   around setting the flags.  It might be nice if that was no-op'd in the
   compatibility header.

 - The changes to module inits are just a pain.  I'm not sure there's really
   anything you can do to make it nicer.  The C porting guides both on
   python.org and on python3porting.com provide some strategies, and I rolled
   my own slightly different approach based on those examples.

   I did define this:

   #ifdef PY3K
   #define RETURN_INITERROR return NULL
   #else
   #define RETURN_INITERROR return
   #endif

   just to make error condition returns a little easier to write.

 - Py_BuildValue() does not have a "y" code in Python 2, so you basically have
   to ifdef around that.

A few more things I ran across at the Python level:

 - For the Python code, I wanted to avoid 2to3, and was mainly successful with
   some liberal sprinkling of sys.version_info.major checks, and __future__
   imports (e.g. print_function, unicode_literals, and absolute_imports).
   Many of these might be nicer with your six module.

 - Long literals (i.e. trailing 'L's are a pain).

 - Metaclasses are a huge pain because the Python 3 syntax prevents
   compilation in Python 2, so you can't use sys.version_info.major checks
   alone.  Looks like six has a nice helper for this; I ended up using exec,
   but I think both cases would be rather painful if the derived class were
   anything more than a `pass` in the body.

 - iteritems() and friends are a pain.  In my case, I think they just weren't
   very useful, so I switched everything back to items() and such.

 - Similarly with xrange().

 - isSequenceType() is gone in Python 3.

 - Dealing with __next__() vs. next() methods.

Anyway, that's everything I kept notes on.

Cheers,
-Barry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-porting/attachments/20111123/ce1167d5/attachment.pgp>