[Python-3000] Draft PEP: Dropping PyObject_HEAD

"Martin v. Löwis" martin at v.loewis.de
Sat Apr 28 00:44:25 CEST 2007


I propose the PEP below for Py3k.

Regards,
Martin

PEP: 3122
Title: Dropping PyObject_HEAD
Version: $Revision: 54998 $
Last-Modified: $Date: 2007-04-27 10:31:58 +0200 (Fr, 27 Apr 2007) $
Author: Martin v. Löwis <martin at v.loewis.de>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 27-Apr-2007
Python-Version: 3.0
Post-History:

Abstract
========

Python currently relies on undefined C behavior, with its
usage of PyObject_HEAD. This PEP proposes to change that
into standard C.

Rationale
=========

Standard C defines that an object must be accessed only through a
pointer of its type, and that all other accesses are undefined
behavior, with a few exceptions. In particular, the following
code has undefined behavior::

  struct FooObject{
    PyObject_HEAD
    int data;
  };

  PyObject *foo(struct FooObject*f){
   return (PyObject*)f;
  }

  int bar(){
   struct FooObject *f = malloc(sizeof(struct FooObject));
   struct PyObject *o = foo(f);
   f->ob_refcnt = 0;
   o->ob_refcnt = 1;
   return f->ob_refcnt;
  }

The problem here is that the storage is both accessed as
if it where struct PyObject, and as struct FooObject.

Historically, compilers did not cause any problems with this
code. However, modern compiler use that clause as an
optimization opportunity, finding that f->ob_refcnt and
o->ob_refcnt cannot possibly refer to the same memory, and
that therefore the function should return 0, without having
to fetch the value of ob_refcnt at all in the return
statement. For GCC, Python now uses -fno-strict-aliasing
to work around that problem; with other compilers, it
may just see undefined behavior. Even with GCC, using
-fno-strict-aliasing may pessimize the generated code
unnecessarily.

Specification
=============

Standard C has one specific exception to its aliasing rules precisely
designed to support the case of Python: a value of a struct type may
also be accessed through a pointer to the first field. E.g. if a
struct starts with an int, the struct\* may also be cast to an int\*,
allowing to write int values into the first field.

For Python, PyObject_HEAD and PyObject_VAR_HEAD will be dropped, and
PyObject gets defined to contain all fields explicitly::

  typedef struct _object{
    _PyObject_HEAD_EXTRA
    Py_ssize_t ob_refcnt;
    struct _typeobject *ob_type;
  }PyObject;

  typedef struct {
    PyObject ob_base;
    Py_ssize_t ob_size;
  } PyVarObject;

Types defined as fixed-size structure will then include PyObject
as its first field; variable-sized objects PyVarObject. E.g.::

  typedef struct{
    PyObject ob_base;
    PyObject *start, *stop, *step;
  } PySliceObject;

  typedef struct{
    PyVarObject ob_base;
    PyObject **ob_item;
    Py_ssize_t allocated;
  } PyListObject;

As a convention, the base field SHOULD be called ob_base. However,
all accesses to ob_refcnt and ob_type MUST cast the object pointer
to PyObject* (unless the pointer is already known to have that
type), and SHOULD use the respective accessor macros. To simplify
access to ob_type, a macro::

  #define Py_Type(o) (((PyObject*)o)->ob_type)

is added.

Copyright
=========

This document has been placed in the public domain.



More information about the Python-3000 mailing list