[Python-checkins] CVS: python/dist/src/Doc/ext embedding.tex,NONE,1.1.2.1 extending.tex,NONE,1.1.2.1 newtypes.tex,NONE,1.1.2.1 unix.tex,NONE,1.1.2.1 windows.tex,NONE,1.1.2.1 ext.tex,1.103,1.103.2.1

Mon, 20 Aug 2001 20:05:19 -0700

Update of /cvsroot/python/python/dist/src/Doc/ext
In directory usw-pr-cvs1:/tmp/cvs-serv30284/ext

Modified Files:
      Tag: r22a2-branch
	ext.tex 
Added Files:
      Tag: r22a2-branch
	embedding.tex extending.tex newtypes.tex unix.tex windows.tex 
Log Message:
Merge the Docs from the trunk for the last time.

--- NEW FILE: embedding.tex ---
\chapter{Embedding Python in Another Application
     \label{embedding}}

The previous chapters discussed how to extend Python, that is, how to
extend the functionality of Python by attaching a library of C
functions to it.  It is also possible to do it the other way around:
enrich your C/\Cpp{} application by embedding Python in it.  Embedding
provides your application with the ability to implement some of the
functionality of your application in Python rather than C or \Cpp.
This can be used for many purposes; one example would be to allow
users to tailor the application to their needs by writing some scripts
in Python.  You can also use it yourself if some of the functionality
can be written in Python more easily.

Embedding Python is similar to extending it, but not quite.  The
difference is that when you extend Python, the main program of the
application is still the Python interpreter, while if you embed
Python, the main program may have nothing to do with Python ---
instead, some parts of the application occasionally call the Python
interpreter to run some Python code.

So if you are embedding Python, you are providing your own main
program.  One of the things this main program has to do is initialize
the Python interpreter.  At the very least, you have to call the
function \cfunction{Py_Initialize()} (on Mac OS, call
\cfunction{PyMac_Initialize()} instead).  There are optional calls to
pass command line arguments to Python.  Then later you can call the
interpreter from any part of the application.

There are several different ways to call the interpreter: you can pass
a string containing Python statements to
\cfunction{PyRun_SimpleString()}, or you can pass a stdio file pointer
and a file name (for identification in error messages only) to
\cfunction{PyRun_SimpleFile()}.  You can also call the lower-level
operations described in the previous chapters to construct and use
Python objects.

A simple demo of embedding Python can be found in the directory
\file{Demo/embed/} of the source distribution.

\begin{seealso}
  \seetitle[../api/api.html]{Python/C API Reference Manual}{The
            details of Python's C interface are given in this manual.
            A great deal of necessary information can be found here.}
\end{seealso}

\section{Very High Level Embedding
         \label{high-level-embedding}}

The simplest form of embedding Python is the use of the very
high level interface. This interface is intended to execute a
Python script without needing to interact with the application
directly. This can for example be used to perform some operation
on a file.

\begin{verbatim}
#include <Python.h>

int main()
{
  Py_Initialize();
  PyRun_SimpleString("from time import time,ctime\n"
                     "print 'Today is',ctime(time())\n");
  Py_Finalize();
  return 0;
}
\end{verbatim}

The above code first initializes the Python interpreter with
\cfunction{Py_Initialize()}, followed by the execution of a hard-coded
Python script that print the date and time.  Afterwards, the
\cfunction{Py_Finalize()} call shuts the interpreter down, followed by
the end of the program.  In a real program, you may want to get the
Python script from another source, perhaps a text-editor routine, a
file, or a database.  Getting the Python code from a file can better
be done by using the \cfunction{PyRun_SimpleFile()} function, which
saves you the trouble of allocating memory space and loading the file
contents.

\section{Beyond Very High Level Embedding: An overview
         \label{lower-level-embedding}}

The high level interface gives you the ability to execute
arbitrary pieces of Python code from your application, but
exchanging data values is quite cumbersome to say the least. If
you want that, you should use lower level calls. At the cost of
having to write more C code, you can achieve almost anything.

It should be noted that extending Python and embedding Python
is quite the same activity, despite the different intent. Most
topics discussed in the previous chapters are still valid. To
show this, consider what the extension code from Python to C
really does:

\begin{enumerate}
    \item Convert data values from Python to C,
    \item Perform a function call to a C routine using the
        converted values, and
    \item Convert the data values from the call from C to Python.
\end{enumerate}

When embedding Python, the interface code does:

\begin{enumerate}
    \item Convert data values from C to Python,
    \item Perform a function call to a Python interface routine
        using the converted values, and
    \item Convert the data values from the call from Python to C.
\end{enumerate}

As you can see, the data conversion steps are simply swapped to
accomodate the different direction of the cross-language transfer.
The only difference is the routine that you call between both
data conversions. When extending, you call a C routine, when
embedding, you call a Python routine.

This chapter will not discuss how to convert data from Python
to C and vice versa.  Also, proper use of references and dealing
with errors is assumed to be understood.  Since these aspects do not
differ from extending the interpreter, you can refer to earlier
chapters for the required information.

\section{Pure Embedding
         \label{pure-embedding}}

The first program aims to execute a function in a Python
script. Like in the section about the very high level interface,
the Python interpreter does not directly interact with the
application (but that will change in th next section).

The code to run a function defined in a Python script is:

\verbatiminput{run-func.c}

This code loads a Python script using \code{argv[1]}, and calls the
function named in \code{argv[2]}.  Its integer arguments are the other
values of the \code{argv} array.  If you compile and link this
program (let's call the finished executable \program{call}), and use
it to execute a Python script, such as:

\begin{verbatim}
def multiply(a,b):
    print "Thy shall add", a, "times", b
    c = 0
    for i in range(0, a):
        c = c + b
    return c
\end{verbatim}

then the result should be:

\begin{verbatim}
$ call multiply 3 2
Thy shall add 3 times 2
Result of call: 6
\end{verbatim} % $

Although the program is quite large for its functionality, most of the
code is for data conversion between Python and C, and for error
reporting.  The interesting part with respect to embedding Python
starts with

\begin{verbatim}
    Py_Initialize();
    pName = PyString_FromString(argv[1]);
    /* Error checking of pName left out */
    pModule = PyImport_Import(pName);
\end{verbatim}

After initializing the interpreter, the script is loaded using
\cfunction{PyImport_Import()}.  This routine needs a Python string
as its argument, which is constructed using the
\cfunction{PyString_FromString()} data conversion routine.

\begin{verbatim}
    pDict = PyModule_GetDict(pModule);
    /* pDict is a borrowed reference */

    pFunc = PyDict_GetItemString(pDict, argv[2]);
    /* pFun is a borrowed reference */

    if (pFunc && PyCallable_Check(pFunc)) {
        ...
    }
\end{verbatim}

Once the script is loaded, its dictionary is retrieved with
\cfunction{PyModule_GetDict()}.  The dictionary is then searched using
the normal dictionary access routines for the function name.  If the
name exists, and the object retunred is callable, you can safely
assume that it is a function.  The program then proceeds by
constructing a tuple of arguments as normal.  The call to the python
function is then made with:

\begin{verbatim}
    pValue = PyObject_CallObject(pFunc, pArgs);
\end{verbatim}

Upon return of the function, \code{pValue} is either \NULL{} or it
contains a reference to the return value of the function.  Be sure to
release the reference after examining the value.

\section{Extending Embedded Python
         \label{extending-with-embedding}}

Until now, the embedded Python interpreter had no access to
functionality from the application itself.  The Python API allows this
by extending the embedded interpreter.  That is, the embedded
interpreter gets extended with routines provided by the application.
While it sounds complex, it is not so bad.  Simply forget for a while
that the application starts the Python interpreter.  Instead, consider
the application to be a set of subroutines, and write some glue code
that gives Python access to those routines, just like you would write
a normal Python extension.  For example:

\begin{verbatim}
static int numargs=0;

/* Return the number of arguments of the application command line */
static PyObject*
emb_numargs(PyObject *self, PyObject *args)
{
    if(!PyArg_ParseTuple(args, ":numargs"))
        return NULL;
    return Py_BuildValue("i", numargs);
}

static PyMethodDef EmbMethods[]={
    {"numargs", emb_numargs, METH_VARARGS},
    {NULL,      NULL}
};
\end{verbatim}

Insert the above code just above the \cfunction{main()} function.
Also, insert the following two statements directly after
\cfunction{Py_Initialize()}:

\begin{verbatim}
    numargs = argc;
    Py_InitModule("emb", EmbMethods);
\end{verbatim}

These two lines initialize the \code{numargs} variable, and make the
\function{emb.numargs()} function accessible to the embedded Python
interpreter.  With these extensions, the Python script can do things
like

\begin{verbatim}
import emb
print "Number of arguments", emb.numargs()
\end{verbatim}

In a real application, the methods will expose an API of the
application to Python.

%\section{For the future}
%
%You don't happen to have a nice library to get textual
%equivalents of numeric values do you :-) ?
%Callbacks here ? (I may be using information from that section
%?!)
%threads
%code examples do not really behave well if errors happen
% (what to watch out for)

\section{Embedding Python in \Cpp{}
     \label{embeddingInCplusplus}}

It is also possible to embed Python in a \Cpp{} program; precisely how this
is done will depend on the details of the \Cpp{} system used; in general you
will need to write the main program in \Cpp{}, and use the \Cpp{} compiler
to compile and link your program.  There is no need to recompile Python
itself using \Cpp{}.

\section{Linking Requirements
         \label{link-reqs}}

While the \program{configure} script shipped with the Python sources
will correctly build Python to export the symbols needed by
dynamically linked extensions, this is not automatically inherited by
applications which embed the Python library statically, at least on
\UNIX.  This is an issue when the application is linked to the static
runtime library (\file{libpython.a}) and needs to load dynamic
extensions (implemented as \file{.so} files).

The problem is that some entry points are defined by the Python
runtime solely for extension modules to use.  If the embedding
application does not use any of these entry points, some linkers will
not include those entries in the symbol table of the finished
executable.  Some additional options are needed to inform the linker
not to remove these symbols.

Determining the right options to use for any given platform can be
quite difficult, but fortunately the Python configuration already has
those values.  To retrieve them from an installed Python interpreter,
start an interactive interpreter and have a short session like this:

\begin{verbatim}
>>> import distutils.sysconfig
>>> distutils.sysconfig.get_config_var('LINKFORSHARED')
'-Xlinker -export-dynamic'
\end{verbatim}
\refstmodindex{distutils.sysconfig}

The contents of the string presented will be the options that should
be used.  If the string is empty, there's no need to add any
additional options.  The \constant{LINKFORSHARED} definition
corresponds to the variable of the same name in Python's top-level
\file{Makefile}.

--- NEW FILE: extending.tex ---
\chapter{Extending Python with C or \Cpp{} \label{intro}}

It is quite easy to add new built-in modules to Python, if you know
how to program in C.  Such \dfn{extension modules} can do two things
that can't be done directly in Python: they can implement new built-in
object types, and they can call C library functions and system calls.

To support extensions, the Python API (Application Programmers
Interface) defines a set of functions, macros and variables that
provide access to most aspects of the Python run-time system.  The
Python API is incorporated in a C source file by including the header
\code{"Python.h"}.

The compilation of an extension module depends on its intended use as
well as on your system setup; details are given in later chapters.

\section{A Simple Example
[...1656 lines suppressed...]
{
    PyObject *m;

    Py_InitModule("client", ClientMethods);
    import_spam();
}
\end{verbatim}

The main disadvantage of this approach is that the file
\file{spammodule.h} is rather complicated. However, the
basic structure is the same for each function that is
exported, so it has to be learned only once.

Finally it should be mentioned that CObjects offer additional
functionality, which is especially useful for memory allocation and
deallocation of the pointer stored in a CObject. The details
are described in the \citetitle[../api/api.html]{Python/C API
Reference Manual} in the section ``CObjects'' and in the
implementation of CObjects (files \file{Include/cobject.h} and
\file{Objects/cobject.c} in the Python source code distribution).

--- NEW FILE: newtypes.tex ---
\chapter{Defining New Types
        \label{defining-new-types}}
\sectionauthor{Michael Hudson}{mwh21@cam.ac.uk}
\sectionauthor{Dave Kuhlman}{dkuhlman@rexx.com}

As mentioned in the last chapter, Python allows the writer of an
extension module to define new types that can be manipulated from
Python code, much like strings and lists in core Python.

This is not hard; the code for all extension types follows a pattern,
but there are some details that you need to understand before you can
get started.

\section{The Basics
    \label{dnt-basics}}

The Python runtime sees all Python objects as variables of type
\ctype{PyObject*}.  A \ctype{PyObject} is not a very magnificent
object - it just contains the refcount and a pointer to the object's
``type object''.  This is where the action is; the type object
determines which (C) functions get called when, for instance, an
attribute gets looked up on an object or it is multiplied by another
object.  I call these C functions ``type methods'' to distinguish them
from things like \code{[].append} (which I will call ``object
methods'' when I get around to them).

So, if you want to define a new object type, you need to create a new
type object.

This sort of thing can only be explained by example, so here's a
minimal, but complete, module that defines a new type:

\begin{verbatim}
#include <Python.h>

staticforward PyTypeObject noddy_NoddyType;

typedef struct {
    PyObject_HEAD
} noddy_NoddyObject;

static PyObject*
noddy_new_noddy(PyObject* self, PyObject* args)
{
    noddy_NoddyObject* noddy;

    if (!PyArg_ParseTuple(args,":new_noddy")) 
        return NULL;

    noddy = PyObject_New(noddy_NoddyObject, &noddy_NoddyType);

    return (PyObject*)noddy;
}

static void
noddy_noddy_dealloc(PyObject* self)
{
    PyObject_Del(self);
}

static PyTypeObject noddy_NoddyType = {
    PyObject_HEAD_INIT(NULL)
    0,
    "Noddy",
    sizeof(noddy_NoddyObject),
    0,
    noddy_noddy_dealloc, /*tp_dealloc*/
    0,          /*tp_print*/
    0,          /*tp_getattr*/
    0,          /*tp_setattr*/
    0,          /*tp_compare*/
    0,          /*tp_repr*/
    0,          /*tp_as_number*/
    0,          /*tp_as_sequence*/
    0,          /*tp_as_mapping*/
    0,          /*tp_hash */
};

static PyMethodDef noddy_methods[] = {
    { "new_noddy", noddy_new_noddy, METH_VARARGS },
    {NULL, NULL}
};

DL_EXPORT(void)
initnoddy(void) 
{
    noddy_NoddyType.ob_type = &PyType_Type;

    Py_InitModule("noddy", noddy_methods);
}
\end{verbatim}

Now that's quite a bit to take in at once, but hopefully bits will
seem familiar from the last chapter.

The first bit that will be new is:

\begin{verbatim}
staticforward PyTypeObject noddy_NoddyType;
\end{verbatim}

This names the type object that will be defining further down in the
file.  It can't be defined here because its definition has to refer to
functions that have no yet been defined, but we need to be able to
refer to it, hence the declaration.

The \code{staticforward} is required to placate various brain dead
compilers.

\begin{verbatim}
typedef struct {
    PyObject_HEAD
} noddy_NoddyObject;
\end{verbatim}

This is what a Noddy object will contain.  In this case nothing more
than every Python object contains - a refcount and a pointer to a type
object.  These are the fields the \code{PyObject_HEAD} macro brings
in.  The reason for the macro is to standardize the layout and to
enable special debugging fields to be brought in debug builds.

For contrast

\begin{verbatim}
typedef struct {
    PyObject_HEAD
    long ob_ival;
} PyIntObject;
\end{verbatim}

is the corresponding definition for standard Python integers.

Next up is:

\begin{verbatim}
static PyObject*
noddy_new_noddy(PyObject* self, PyObject* args)
{
    noddy_NoddyObject* noddy;

    if (!PyArg_ParseTuple(args,":new_noddy")) 
        return NULL;

    noddy = PyObject_New(noddy_NoddyObject, &noddy_NoddyType);

    return (PyObject*)noddy;
}
\end{verbatim}

This is in fact just a regular module function, as described in the
last chapter.  The reason it gets special mention is that this is
where we create our Noddy object.  Defining PyTypeObject structures is
all very well, but if there's no way to actually \emph{create} one
of the wretched things it is not going to do anyone much good.

Almost always, you create objects with a call of the form:

\begin{verbatim}
PyObject_New(<type>, &<type object>);
\end{verbatim}

This allocates the memory and then initializes the object (sets
the reference count to one, makes the \cdata{ob_type} pointer point at
the right place and maybe some other stuff, depending on build options).
You \emph{can} do these steps separately if you have some reason to
--- but at this level we don't bother.

We cast the return value to a \ctype{PyObject*} because that's what
the Python runtime expects.  This is safe because of guarantees about
the layout of structures in the C standard, and is a fairly common C
programming trick.  One could declare \cfunction{noddy_new_noddy} to
return a \ctype{noddy_NoddyObject*} and then put a cast in the
definition of \cdata{noddy_methods} further down the file --- it
doesn't make much difference.

Now a Noddy object doesn't do very much and so doesn't need to
implement many type methods.  One you can't avoid is handling
deallocation, so we find

\begin{verbatim}
static void
noddy_noddy_dealloc(PyObject* self)
{
    PyObject_Del(self);
}
\end{verbatim}

This is so short as to be self explanatory.  This function will be
called when the reference count on a Noddy object reaches \code{0} (or
it is found as part of an unreachable cycle by the cyclic garbage
collector).  \cfunction{PyObject_Del()} is what you call when you want
an object to go away.  If a Noddy object held references to other
Python objects, one would decref them here.

Moving on, we come to the crunch --- the type object.

\begin{verbatim}
static PyTypeObject noddy_NoddyType = {
    PyObject_HEAD_INIT(NULL)
    0,
    "Noddy",
    sizeof(noddy_NoddyObject),
    0,
    noddy_noddy_dealloc, /*tp_dealloc*/
    0,                   /*tp_print*/
    0,                   /*tp_getattr*/
    0,                   /*tp_setattr*/
    0,                   /*tp_compare*/
    0,                   /*tp_repr*/
    0,                   /*tp_as_number*/
    0,                   /*tp_as_sequence*/
    0,                   /*tp_as_mapping*/
    0,                   /*tp_hash */
};
\end{verbatim}

Now if you go and look up the definition of \ctype{PyTypeObject} in
\file{object.h} you'll see that it has many, many more fields that the
definition above.  The remaining fields will be filled with zeros by
the C compiler, and it's common practice to not specify them
explicitly unless you need them.  

This is so important that I'm going to pick the top of it apart still
further:

\begin{verbatim}
    PyObject_HEAD_INIT(NULL)
\end{verbatim}

This line is a bit of a wart; what we'd like to write is:

\begin{verbatim}
    PyObject_HEAD_INIT(&PyType_Type)
\end{verbatim}

as the type of a type object is ``type'', but this isn't strictly
conforming C and some compilers complain.  So instead we fill in the
\cdata{ob_type} field of \cdata{noddy_NoddyType} at the earliest
oppourtunity --- in \cfunction{initnoddy()}.

\begin{verbatim}
    0,
\end{verbatim}

XXX why does the type info struct start PyObject_*VAR*_HEAD??

\begin{verbatim}
    "Noddy",
\end{verbatim}

The name of our type.  This will appear in the default textual
representation of our objects and in some error messages, for example:

\begin{verbatim}
>>> "" + noddy.new_noddy()
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
TypeError: cannot add type "Noddy" to string
\end{verbatim}

\begin{verbatim}
    sizeof(noddy_NoddyObject),
\end{verbatim}

This is so that Python knows how much memory to allocate when you call
\cfunction{PyObject_New}.

\begin{verbatim}
    0,
\end{verbatim}

This has to do with variable length objects like lists and strings.
Ignore for now...

Now we get into the type methods, the things that make your objects
different from the others.  Of course, the Noddy object doesn't
implement many of these, but as mentioned above you have to implement
the deallocation function.

\begin{verbatim}
    noddy_noddy_dealloc, /*tp_dealloc*/
\end{verbatim}

>From here, all the type methods are nil so I won't go over them yet -
that's for the next section!

Everything else in the file should be familiar, except for this line
in \cfunction{initnoddy}:

\begin{verbatim}
    noddy_NoddyType.ob_type = &PyType_Type;
\end{verbatim}

This was alluded to above --- the \cdata{noddy_NoddyType} object should
have type ``type'', but \code{\&PyType_Type} is not constant and so
can't be used in its initializer.  To work around this, we patch it up
in the module initialization.

That's it!  All that remains is to build it; put the above code in a
file called \file{noddymodule.c} and

\begin{verbatim}
from distutils.core import setup, Extension
setup(name = "noddy", version = "1.0",
    ext_modules = [Extension("noddy", ["noddymodule.c"])])
\end{verbatim}

in a file called \file{setup.py}; then typing

\begin{verbatim}
$ python setup.py build%$
\end{verbatim}

at a shell should produce a file \file{noddy.so} in a subdirectory;
move to that directory and fire up Python --- you should be able to
\code{import noddy} and play around with Noddy objects.

That wasn't so hard, was it?

\section{Type Methods
         \label{dnt-type-methods}}

This section aims to give a quick fly-by on the various type methods
you can implement and what they do.

Here is the definition of \ctype{PyTypeObject}, with some fields only
used in debug builds omitted:

\begin{verbatim}
typedef struct _typeobject {
    PyObject_VAR_HEAD
    char *tp_name; /* For printing */
    int tp_basicsize, tp_itemsize; /* For allocation */

    /* Methods to implement standard operations */

    destructor tp_dealloc;
    printfunc tp_print;
    getattrfunc tp_getattr;
    setattrfunc tp_setattr;
    cmpfunc tp_compare;
    reprfunc tp_repr;

    /* Method suites for standard classes */

    PyNumberMethods *tp_as_number;
    PySequenceMethods *tp_as_sequence;
    PyMappingMethods *tp_as_mapping;

    /* More standard operations (here for binary compatibility) */

    hashfunc tp_hash;
    ternaryfunc tp_call;
    reprfunc tp_str;
    getattrofunc tp_getattro;
    setattrofunc tp_setattro;

    /* Functions to access object as input/output buffer */
    PyBufferProcs *tp_as_buffer;

    /* Flags to define presence of optional/expanded features */
    long tp_flags;

    char *tp_doc; /* Documentation string */

    /* Assigned meaning in release 2.0 */
    /* call function for all accessible objects */
    traverseproc tp_traverse;

    /* delete references to contained objects */
    inquiry tp_clear;

    /* Assigned meaning in release 2.1 */
    /* rich comparisons */
    richcmpfunc tp_richcompare;

    /* weak reference enabler */
    long tp_weaklistoffset;

    /* Added in release 2.2 */
    /* Iterators */
    getiterfunc tp_iter;
    iternextfunc tp_iternext;

    /* Attribute descriptor and subclassing stuff */
    struct PyMethodDef *tp_methods;
    struct memberlist *tp_members;
    struct getsetlist *tp_getset;
    struct _typeobject *tp_base;
    PyObject *tp_dict;
    descrgetfunc tp_descr_get;
    descrsetfunc tp_descr_set;
    long tp_dictoffset;
    initproc tp_init;
    allocfunc tp_alloc;
    newfunc tp_new;
    destructor tp_free; /* Low-level free-memory routine */
    PyObject *tp_bases;
    PyObject *tp_mro; /* method resolution order */
    PyObject *tp_defined;

} PyTypeObject;
\end{verbatim}

Now that's a \emph{lot} of methods.  Don't worry too much though - if
you have a type you want to define, the chances are very good that you
will only implement a handful of these.

As you probably expect by now, we're going to go over this and give
more information about the various handlers.  We won't go in the order
they are defined in the structure, because there is a lot of
historical baggage that impacts the ordering of the fields; be sure
your type initializaion keeps the fields in the right order!  It's
often easiest to find an example that includes all the fields you need
(even if they're initialized to \code{0}) and then change the values
to suit your new type.

\begin{verbatim}
    char *tp_name; /* For printing */
\end{verbatim}

The name of the type - as mentioned in the last section, this will
appear in various places, almost entirely for diagnostic purposes.
Try to choose something that will be helpful in such a situation!

\begin{verbatim}
    int tp_basicsize, tp_itemsize; /* For allocation */
\end{verbatim}

These fields tell the runtime how much memory to allocate when new
objects of this typed are created.  Python has some builtin support
for variable length structures (think: strings, lists) which is where
the \cdata{tp_itemsize} field comes in.  This will be dealt with
later.

\begin{verbatim}
    char *tp_doc;
\end{verbatim}

Here you can put a string (or its address) that you want returned when
the Python script references \code{obj.__doc__} to retrieve the
docstring.

Now we come to the basic type methods---the ones most extension types
will implement.

\subsection{Finalization and De-allocation}

\begin{verbatim}
    destructor tp_dealloc;
\end{verbatim}

This function is called when the reference count of the instance of
your type is reduced to zero and the Python interpreter wants to
reclaim it.  If your type has memory to free or other clean-up to
perform, put it here.  The object itself needs to be freed here as
well.  Here is an example of this function:

\begin{verbatim}
static void
newdatatype_dealloc(newdatatypeobject * obj)
{
    free(obj->obj_UnderlyingDatatypePtr);
    PyObject_DEL(obj);
}
\end{verbatim}

\subsection{Object Representation}

In Python, there are three ways to generate a textual representation
of an object: the \function{repr()}\bifuncindex{repr} function (or
equivalent backtick syntax), the \function{str()}\bifuncindex{str}
function, and the \keyword{print} statement.  For most objects, the
\keyword{print} statement is equivalent to the \function{str()}
function, but it is possible to special-case printing to a
\ctype{FILE*} if necessary; this should only be done if efficiency is
identified as a problem and profiling suggests that creating a
temporary string object to be written to a file is too expensive.

These handlers are all optional, and most types at most need to
implement the \member{tp_str} and \member{tp_repr} handlers.

\begin{verbatim}
    reprfunc tp_repr;
    reprfunc tp_str;
    printfunc tp_print;
\end{verbatim}

The \member{tp_repr} handler should return a string object containing
a representation of the instance for which it is called.  Here is a
simple example:

\begin{verbatim}
static PyObject *
newdatatype_repr(newdatatypeobject * obj)
{
    char buf[4096];
    sprintf(buf, "Repr-ified_newdatatype{{size:%d}}",
            obj->obj_UnderlyingDatatypePtr->size);
    return PyString_FromString(buf);
}
\end{verbatim}

If no \member{tp_repr} handler is specified, the interpreter will
supply a representation that uses the type's \member{tp_name} and a
uniquely-identifying value for the object.

The \member{tp_str} handler is to \function{str()} what the
\member{tp_repr} handler described above is to \function{repr()}; that
is, it is called when Python code calls \function{str()} on an
instance of your object.  It's implementation is very similar to the
\member{tp_repr} function, but the resulting string is intended to be
human consumption.  It \member{tp_str} is not specified, the
\member{tp_repr} handler is used instead.

Here is a simple example:

\begin{verbatim}
static PyObject *
newdatatype_str(newdatatypeobject * obj)
{
    PyObject *pyString;
    char buf[4096];
    sprintf(buf, "Stringified_newdatatype{{size:%d}}",
        obj->obj_UnderlyingDatatypePtr->size
        );
    pyString = PyString_FromString(buf);
    return pyString;
}
\end{verbatim}

The print function will be called whenever Python needs to "print" an
instance of the type.  For example, if 'node' is an instance of type
TreeNode, then the print function is called when Python code calls:

\begin{verbatim}
print node
\end{verbatim}

There is a flags argument and one flag, \constant{Py_PRINT_RAW}, and
it suggests that you print without string quotes and possibly without
interpreting escape sequences.

The print function receives a file object as an argument. You will
likely want to write to that file object.

Here is a sampe print function:

\begin{verbatim}
static int
newdatatype_print(newdatatypeobject *obj, FILE *fp, int flags)
{
    if (flags & Py_PRINT_RAW) {
        fprintf(fp, "<{newdatatype object--size: %d}>",
                obj->obj_UnderlyingDatatypePtr->size);
    }
    else {
        fprintf(fp, "\"<{newdatatype object--size: %d}>\"",
                obj->obj_UnderlyingDatatypePtr->size);
    }
    return 0;
}
\end{verbatim}

\subsection{Attribute Management Functions}

\begin{verbatim}
    getattrfunc tp_getattr;
    setattrfunc tp_setattr;
\end{verbatim}

The \member{tp_getattr} handle is called when the object requires an
attribute look-up.  It is called in the same situations where the
\method{__getattr__()} method of a class would be called.

A likely way to handle this is (1) to implement a set of functions
(such as \cfunction{newdatatype_getSize()} and
\cfunction{newdatatype_setSize()} in the example below), (2) provide a
method table listing these functions, and (3) provide a getattr
function that returns the result of a lookup in that table.

Here is an example:

\begin{verbatim}
static PyMethodDef newdatatype_methods[] = {
    {"getSize", (PyCFunction)newdatatype_getSize, METH_VARARGS},
    {"setSize", (PyCFunction)newdatatype_setSize, METH_VARARGS},
    {NULL,      NULL}           /* sentinel */
};

static PyObject *
newdatatype_getattr(newdatatypeobject *obj, char *name)
{
    return Py_FindMethod(newdatatype_methods, (PyObject *)obj, name);
}
\end{verbatim}

The \member{tp_setattr} handler is called when the
\method{__setattr__()} or \method{__delattr__()} method of a class
instance would be called.  When an attribute should be deleted, the
third parameter will be \NULL.  Here is an example that simply raises
an exception; if this were really all you wanted, the
\member{tp_setattr} handler should be set to \NULL.

\begin{verbatim}
static int
newdatatype_setattr(newdatatypeobject *obj, char *name, PyObject *v)
{
    char buf[1024];
    sprintf(buf, "Set attribute not supported for attribute %s", name);
    PyErr_SetString(PyExc_RuntimeError, buf);
    return -1;
}
\end{verbatim}

\subsection{Object Comparison}

\begin{verbatim}
    cmpfunc tp_compare;
\end{verbatim}

The \member{tp_compare} handler is called when comparisons are needed
are the object does not implement the specific rich comparison method
which matches the requested comparison.  (It is always used if defined
and the \cfunction{PyObject_Compare()} or \cfunction{PyObject_Cmp()}
functions are used, or if \function{cmp()} is used from Python.)
It is analogous to the \method{__cmp__()} method.  This function
should return a negative integer if \var{obj1} is less than
\var{obj2}, \code{0} if they are equal, and a positive integer if
\var{obj1} is greater than
\var{obj2}.

Here is a sample implementation:

\begin{verbatim}
static int
newdatatype_compare(newdatatypeobject * obj1, newdatatypeobject * obj2)
{
    long result;

    if (obj1->obj_UnderlyingDatatypePtr->size <
        obj2->obj_UnderlyingDatatypePtr->size) {
        result = -1;
    }
    else if (obj1->obj_UnderlyingDatatypePtr->size >
             obj2->obj_UnderlyingDatatypePtr->size) {
        result = 1;
    }
    else {
        result = 0;
    }
    return result;
}
\end{verbatim}

\subsection{Abstract Protocol Support}

\begin{verbatim}
  tp_as_number;
  tp_as_sequence;
  tp_as_mapping;
\end{verbatim}

If you wish your object to be able to act like a number, a sequence,
or a mapping object, then you place the address of a structure that
implements the C type \ctype{PyNumberMethods},
\ctype{PySequenceMethods}, or \ctype{PyMappingMethods}, respectively.
It is up to you to fill in this structure with appropriate values. You
can find examples of the use of each of these in the \file{Objects}
directory of the Python source distribution.

\begin{verbatim}
    hashfunc tp_hash;
\end{verbatim}

This function, if you choose to provide it, should return a hash
number for an instance of your datatype. Here is a moderately
pointless example:

\begin{verbatim}
static long
newdatatype_hash(newdatatypeobject *obj)
{
    long result;
    result = obj->obj_UnderlyingDatatypePtr->size;
    result = result * 3;
    return result;
}
\end{verbatim}

\begin{verbatim}
    ternaryfunc tp_call;
\end{verbatim}

This function is called when an instance of your datatype is "called",
for example, if \code{obj1} is an instance of your datatype and the Python
script contains \code{obj1('hello')}, the \member{tp_call} handler is
invoked.

This function takes three arguments:

\begin{enumerate}
  \item
    \var{arg1} is the instance of the datatype which is the subject of
    the call. If the call is \code{obj1('hello')}, then \var{arg1} is
    \code{obj1}.

  \item
    \var{arg2} is a tuple containing the arguments to the call.  You
    can use \cfunction{PyArg_ParseTuple()} to extract the arguments.

  \item
    \var{arg3} is a dictionary of keyword arguments that were passed.
    If this is non-\NULL{} and you support keyword arguments, use
    \cfunction{PyArg_ParseTupleAndKeywords()} to extract the
    arguments.  If you do not want to support keyword arguments and
    this is non-\NULL, raise a \exception{TypeError} with a message
    saying that keyword arguments are not supported.
\end{enumerate}

Here is a desultory example of the implementation of call function.

\begin{verbatim}
/* Implement the call function.
 *    obj1 is the instance receiving the call.
 *    obj2 is a tuple containing the arguments to the call, in this
 *         case 3 strings.
 */
static PyObject *
newdatatype_call(newdatatypeobject *obj, PyObject *args, PyObject *other)
{
    PyObject *result;
    char *arg1;
    char *arg2;
    char *arg3;
    char buf[4096];
    if (!PyArg_ParseTuple(args, "sss:call", &arg1, &arg2, &arg3)) {
        return NULL;
    }
    sprintf(buf,
            "Returning -- value: [%d] arg1: [%s] arg2: [%s] arg3: [%s]\n",
            obj->obj_UnderlyingDatatypePtr->size,
            arg1, arg2, arg3);
    printf(buf);
    return PyString_FromString(buf);
}
\end{verbatim}

\subsection{More Suggestions}

Remember that you can omit most of these functions, in which case you
provide \code{0} as a value.

In the \file{Objects} directory of the Python source distribution,
there is a file \file{xxobject.c}, which is intended to be used as a
template for the implementation of new types.  One useful strategy
for implementing a new type is to copy and rename this file, then
read the instructions at the top of it.

There are type definitions for each of the functions you must
provide.  They are in \file{object.h} in the Python include
directory that comes with the source distribution of Python.

In order to learn how to implement any specific method for your new
datatype, do the following: Download and unpack the Python source
distribution.  Go the the \file{Objects} directory, then search the
C source files for \code{tp_} plus the function you want (for
example, \code{tp_print} or \code{tp_compare}).  You will find
examples of the function you want to implement.

When you need to verify that the type of an object is indeed the
object you are implementing and if you use xxobject.c as an starting
template for your implementation, then there is a macro defined for
this purpose. The macro definition will look something like this:

\begin{verbatim}
#define is_newdatatypeobject(v)  ((v)->ob_type == &Newdatatypetype)
\end{verbatim}

And, a sample of its use might be something like the following:

\begin{verbatim}
    if (!is_newdatatypeobject(objp1) {
        PyErr_SetString(PyExc_TypeError, "arg #1 not a newdatatype");
        return NULL;
    }
\end{verbatim}

%For a reasonably extensive example, from which most of the snippits
%above were taken, see \file{newdatatype.c} and \file{newdatatype.h}.

--- NEW FILE: unix.tex ---
\chapter{Building C and \Cpp{} Extensions on \UNIX{}
     \label{building-on-unix}}

\sectionauthor{Jim Fulton}{jim@zope.com}

%The make file make file, building C extensions on Unix

Starting in Python 1.4, Python provides a special make file for
building make files for building dynamically-linked extensions and
custom interpreters.  The make file make file builds a make file
that reflects various system variables determined by configure when
the Python interpreter was built, so people building module's don't
have to resupply these settings.  This vastly simplifies the process
of building extensions and custom interpreters on Unix systems.

The make file make file is distributed as the file
\file{Misc/Makefile.pre.in} in the Python source distribution.  The
first step in building extensions or custom interpreters is to copy
this make file to a development directory containing extension module
source.

The make file make file, \file{Makefile.pre.in} uses metadata
provided in a file named \file{Setup}.  The format of the \file{Setup}
file is the same as the \file{Setup} (or \file{Setup.dist}) file
provided in the \file{Modules/} directory of the Python source
distribution.  The \file{Setup} file contains variable definitions:

\begin{verbatim}
EC=/projects/ExtensionClass
\end{verbatim}

and module description lines.  It can also contain blank lines and
comment lines that start with \character{\#}.

A module description line includes a module name, source files,
options, variable references, and other input files, such
as libraries or object files.  Consider a simple example:

\begin{verbatim}
ExtensionClass ExtensionClass.c
\end{verbatim}

This is the simplest form of a module definition line.  It defines a
module, \module{ExtensionClass}, which has a single source file,
\file{ExtensionClass.c}.

This slightly more complex example uses an \strong{-I} option to
specify an include directory:

\begin{verbatim}
EC=/projects/ExtensionClass
cPersistence cPersistence.c -I$(EC)
\end{verbatim} % $ <-- bow to font lock

This example also illustrates the format for variable references.

For systems that support dynamic linking, the \file{Setup} file should 
begin:

\begin{verbatim}
*shared*
\end{verbatim}

to indicate that the modules defined in \file{Setup} are to be built
as dynamically linked modules.  A line containing only \samp{*static*}
can be used to indicate the subsequently listed modules should be
statically linked.

Here is a complete \file{Setup} file for building a
\module{cPersistent} module:

\begin{verbatim}
# Set-up file to build the cPersistence module. 
# Note that the text should begin in the first column.
*shared*

# We need the path to the directory containing the ExtensionClass
# include file.
EC=/projects/ExtensionClass
cPersistence cPersistence.c -I$(EC)
\end{verbatim} % $ <-- bow to font lock

After the \file{Setup} file has been created, \file{Makefile.pre.in}
is run with the \samp{boot} target to create a make file:

\begin{verbatim}
make -f Makefile.pre.in boot
\end{verbatim}

This creates the file, Makefile.  To build the extensions, simply
run the created make file:

\begin{verbatim}
make
\end{verbatim}

It's not necessary to re-run \file{Makefile.pre.in} if the
\file{Setup} file is changed.  The make file automatically rebuilds
itself if the \file{Setup} file changes.

\section{Building Custom Interpreters \label{custom-interps}}

The make file built by \file{Makefile.pre.in} can be run with the
\samp{static} target to build an interpreter:

\begin{verbatim}
make static
\end{verbatim}

Any modules defined in the \file{Setup} file before the
\samp{*shared*} line will be statically linked into the interpreter.
Typically, a \samp{*shared*} line is omitted from the
\file{Setup} file when a custom interpreter is desired.

\section{Module Definition Options \label{module-defn-options}}

Several compiler options are supported:

\begin{tableii}{l|l}{programopt}{Option}{Meaning}
  \lineii{-C}{Tell the C pre-processor not to discard comments}
  \lineii{-D\var{name}=\var{value}}{Define a macro}
  \lineii{-I\var{dir}}{Specify an include directory, \var{dir}}
  \lineii{-L\var{dir}}{Specify a link-time library directory, \var{dir}}
  \lineii{-R\var{dir}}{Specify a run-time library directory, \var{dir}}
  \lineii{-l\var{lib}}{Link a library, \var{lib}}
  \lineii{-U\var{name}}{Undefine a macro}
\end{tableii}

Other compiler options can be included (snuck in) by putting them
in variables.

Source files can include files with \file{.c}, \file{.C}, \file{.cc},
\file{.cpp}, \file{.cxx}, and \file{.c++} extensions. 

Other input files include files with \file{.a}, \file{.o}, \file{.sl}, 
and \file{.so} extensions.

\section{Example \label{module-defn-example}}

Here is a more complicated example from \file{Modules/Setup.dist}:

\begin{verbatim}
GMP=/ufs/guido/src/gmp
mpz mpzmodule.c -I$(GMP) $(GMP)/libgmp.a
\end{verbatim}

which could also be written as:

\begin{verbatim}
mpz mpzmodule.c -I$(GMP) -L$(GMP) -lgmp
\end{verbatim}

\section{Distributing your extension modules
     \label{distributing}}

There are two ways to distribute extension modules for others to use.
The way that allows the easiest cross-platform support is to use the
\module{distutils}\refstmodindex{distutils} package.  The manual
\citetitle[../dist/dist.html]{Distributing Python Modules} contains
information on this approach.  It is recommended that all new
extensions be distributed using this approach to allow easy building
and installation across platforms.  Older extensions should migrate to
this approach as well.

What follows describes the older approach; there are still many
extensions which use this.

When distributing your extension modules in source form, make sure to
include a \file{Setup} file.  The \file{Setup} file should be named
\file{Setup.in} in the distribution.  The make file make file,
\file{Makefile.pre.in}, will copy \file{Setup.in} to \file{Setup} if
the person installing the extension doesn't do so manually.
Distributing a \file{Setup.in} file makes it easy for people to
customize the \file{Setup} file while keeping the original in
\file{Setup.in}.

It is a good idea to include a copy of \file{Makefile.pre.in} for
people who do not have a source distribution of Python.

Do not distribute a make file.  People building your modules
should use \file{Makefile.pre.in} to build their own make file.  A
\file{README} file included in the package should provide simple
instructions to perform the build.

--- NEW FILE: windows.tex ---
\chapter{Building C and \Cpp{} Extensions on Windows
     \label{building-on-windows}}

This chapter briefly explains how to create a Windows extension module
for Python using Microsoft Visual \Cpp{}, and follows with more
detailed background information on how it works.  The explanatory
material is useful for both the Windows programmer learning to build
Python extensions and the \UNIX{} programmer interested in producing
software which can be successfully built on both \UNIX{} and Windows.

\section{A Cookbook Approach \label{win-cookbook}}

\sectionauthor{Neil Schemenauer}{neil_schemenauer@transcanada.com}

This section provides a recipe for building a Python extension on
Windows.

Grab the binary installer from \url{http://www.python.org/} and
install Python.  The binary installer has all of the required header
files except for \file{pyconfig.h}.

Get the source distribution and extract it into a convenient location.
Copy the \file{pyconfig.h} from the \file{PC/} directory into the
\file{include/} directory created by the installer.

Create a \file{Setup} file for your extension module, as described in
chapter \ref{building-on-unix}.

Get David Ascher's \file{compile.py} script from
\url{http://starship.python.net/crew/da/compile/}.  Run the script to
create Microsoft Visual \Cpp{} project files.

Open the DSW file in Visual \Cpp{} and select \strong{Build}.

If your module creates a new type, you may have trouble with this line:

\begin{verbatim}
    PyObject_HEAD_INIT(&PyType_Type)
\end{verbatim}

Change it to:

\begin{verbatim}
    PyObject_HEAD_INIT(NULL)
\end{verbatim}

and add the following to the module initialization function:

\begin{verbatim}
    MyObject_Type.ob_type = &PyType_Type;
\end{verbatim}

Refer to section 3 of the
\citetitle[http://www.python.org/doc/FAQ.html]{Python FAQ} for details
on why you must do this.

\section{Differences Between \UNIX{} and Windows
     \label{dynamic-linking}}
\sectionauthor{Chris Phoenix}{cphoenix@best.com}

\UNIX{} and Windows use completely different paradigms for run-time
loading of code.  Before you try to build a module that can be
dynamically loaded, be aware of how your system works.

In \UNIX{}, a shared object (\file{.so}) file contains code to be used by the
program, and also the names of functions and data that it expects to
find in the program.  When the file is joined to the program, all
references to those functions and data in the file's code are changed
to point to the actual locations in the program where the functions
and data are placed in memory.  This is basically a link operation.

In Windows, a dynamic-link library (\file{.dll}) file has no dangling
references.  Instead, an access to functions or data goes through a
lookup table.  So the DLL code does not have to be fixed up at runtime
to refer to the program's memory; instead, the code already uses the
DLL's lookup table, and the lookup table is modified at runtime to
point to the functions and data.

In \UNIX{}, there is only one type of library file (\file{.a}) which
contains code from several object files (\file{.o}).  During the link
step to create a shared object file (\file{.so}), the linker may find
that it doesn't know where an identifier is defined.  The linker will
look for it in the object files in the libraries; if it finds it, it
will include all the code from that object file.

In Windows, there are two types of library, a static library and an
import library (both called \file{.lib}).  A static library is like a
\UNIX{} \file{.a} file; it contains code to be included as necessary.
An import library is basically used only to reassure the linker that a
certain identifier is legal, and will be present in the program when
the DLL is loaded.  So the linker uses the information from the
import library to build the lookup table for using identifiers that
are not included in the DLL.  When an application or a DLL is linked,
an import library may be generated, which will need to be used for all
future DLLs that depend on the symbols in the application or DLL.

Suppose you are building two dynamic-load modules, B and C, which should
share another block of code A.  On \UNIX{}, you would \emph{not} pass
\file{A.a} to the linker for \file{B.so} and \file{C.so}; that would
cause it to be included twice, so that B and C would each have their
own copy.  In Windows, building \file{A.dll} will also build
\file{A.lib}.  You \emph{do} pass \file{A.lib} to the linker for B and
C.  \file{A.lib} does not contain code; it just contains information
which will be used at runtime to access A's code.  

In Windows, using an import library is sort of like using \samp{import
spam}; it gives you access to spam's names, but does not create a
separate copy.  On \UNIX{}, linking with a library is more like
\samp{from spam import *}; it does create a separate copy.

\section{Using DLLs in Practice \label{win-dlls}}
\sectionauthor{Chris Phoenix}{cphoenix@best.com}

Windows Python is built in Microsoft Visual \Cpp{}; using other
compilers may or may not work (though Borland seems to).  The rest of
this section is MSV\Cpp{} specific.

When creating DLLs in Windows, you must pass \file{python15.lib} to
the linker.  To build two DLLs, spam and ni (which uses C functions
found in spam), you could use these commands:

\begin{verbatim}
cl /LD /I/python/include spam.c ../libs/python15.lib
cl /LD /I/python/include ni.c spam.lib ../libs/python15.lib
\end{verbatim}

The first command created three files: \file{spam.obj},
\file{spam.dll} and \file{spam.lib}.  \file{Spam.dll} does not contain
any Python functions (such as \cfunction{PyArg_ParseTuple()}), but it
does know how to find the Python code thanks to \file{python15.lib}.

The second command created \file{ni.dll} (and \file{.obj} and
\file{.lib}), which knows how to find the necessary functions from
spam, and also from the Python executable.

Not every identifier is exported to the lookup table.  If you want any
other modules (including Python) to be able to see your identifiers,
you have to say \samp{_declspec(dllexport)}, as in \samp{void
_declspec(dllexport) initspam(void)} or \samp{PyObject
_declspec(dllexport) *NiGetSpamData(void)}.

Developer Studio will throw in a lot of import libraries that you do
not really need, adding about 100K to your executable.  To get rid of
them, use the Project Settings dialog, Link tab, to specify
\emph{ignore default libraries}.  Add the correct
\file{msvcrt\var{xx}.lib} to the list of libraries.

Index: ext.tex
===================================================================
RCS file: /cvsroot/python/python/dist/src/Doc/ext/ext.tex,v
retrieving revision 1.103
retrieving revision 1.103.2.1
diff -C2 -d -r1.103 -r1.103.2.1
*** ext.tex	2001/08/15 19:07:18	1.103
--- ext.tex	2001/08/21 03:05:17	1.103.2.1
***************
*** 51,3212 ****

! \chapter{Extending Python with C or \Cpp{} \label{intro}}
! 
! 
! It is quite easy to add new built-in modules to Python, if you know
! how to program in C.  Such \dfn{extension modules} can do two things
! that can't be done directly in Python: they can implement new built-in
! object types, and they can call C library functions and system calls.
! 
[...3142 lines suppressed...]
! \end{verbatim}
! \refstmodindex{distutils.sysconfig}
! 
! The contents of the string presented will be the options that should
! be used.  If the string is empty, there's no need to add any
! additional options.  The \constant{LINKFORSHARED} definition
! corresponds to the variable of the same name in Python's top-level
! \file{Makefile}.

--- 51,59 ----

! \input{extending}
! \input{newtypes}
! \input{unix}
! \input{windows}
! \input{embedding}