PY_ARRAY_UNIQUE_SYMBOL is too far reaching?
Hi everyone, I've recently been developing a python module and C++ library in parallel, with core functionality in python and C++ largely just layered on top of the python (with boost.python.) In some cases, however, for performance reasons, the C++ API "reaches into" the python code via the C API, and this tends to happen very often with numpy-related code. As a result, I'm using the numpy C API a lot in my C++ library. To make a long story short, I'm finding that there are many places where I need to include numpy headers in my own headers (e.g. when a template class uses part of the numpy API.) If the symbol I want in my header is in ndarrayobject.h, it seems that I'm obligated to define PY_ARRAY_UNIQUE_SYMBOL because that file includes __multiarray_api.h. However, defining that macro in a header file seems like a bad idea because of potential conflicts with headers from other libraries. As a motivating example, I have a header which implements a type-mapping template for numpy types. It maps at compile time between the NPY_TYPES enum and actual C++ types. To get the NPY_TYPES definitions, I have to include arrayobject.h. Even though my header doesn't actually use the symbols that PY_ARRAY_UNIQUE_SYMBOL influences, it seems that I need to define it; even without it being defined, __multiarray_api.h leaves a PyArray_API definition in my header. So, am I missing something obvious? Or is there no way to access symbols like NPY_TYPES without pulling in the PyArray_API functions? If there isn't, would it be possible to restructure the headers so that the symbols affected by PY_ARRAY_UNIQUE_SYMBOL are separated from the others? Thanks. Austin
On Mon, May 3, 2010 at 7:23 PM, Austin Bingham <austin.bingham@gmail.com> wrote:
Hi everyone,
I've recently been developing a python module and C++ library in parallel, with core functionality in python and C++ largely just layered on top of the python (with boost.python.) In some cases, however, for performance reasons, the C++ API "reaches into" the python code via the C API, and this tends to happen very often with numpy-related code.
As a result, I'm using the numpy C API a lot in my C++ library. To make a long story short, I'm finding that there are many places where I need to include numpy headers in my own headers (e.g. when a template class uses part of the numpy API.) If the symbol I want in my header is in ndarrayobject.h, it seems that I'm obligated to define PY_ARRAY_UNIQUE_SYMBOL because that file includes __multiarray_api.h. However, defining that macro in a header file seems like a bad idea because of potential conflicts with headers from other libraries.
You don't need to define PY_ARRAY_UNIQUE_SYMBOL to include any public numpy header - it seems that you are trying to avoid getting PyArray_API defined, but I don't understand why. PY_ARRAY_UNIQUE_SYMBOL should only be used when you want to split your extension into separately compilation units (object files). David
On Tue, May 4, 2010 at 7:05 AM, David Cournapeau <cournape@gmail.com> wrote:
On Mon, May 3, 2010 at 7:23 PM, Austin Bingham <austin.bingham@gmail.com> wrote:
Hi everyone,
I've recently been developing a python module and C++ library in parallel, with core functionality in python and C++ largely just layered on top of the python (with boost.python.) In some cases, however, for performance reasons, the C++ API "reaches into" the python code via the C API, and this tends to happen very often with numpy-related code.
As a result, I'm using the numpy C API a lot in my C++ library. To make a long story short, I'm finding that there are many places where I need to include numpy headers in my own headers (e.g. when a template class uses part of the numpy API.) If the symbol I want in my header is in ndarrayobject.h, it seems that I'm obligated to define PY_ARRAY_UNIQUE_SYMBOL because that file includes __multiarray_api.h. However, defining that macro in a header file seems like a bad idea because of potential conflicts with headers from other libraries.
You don't need to define PY_ARRAY_UNIQUE_SYMBOL to include any public numpy header - it seems that you are trying to avoid getting PyArray_API defined, but I don't understand why.
PY_ARRAY_UNIQUE_SYMBOL should only be used when you want to split your extension into separately compilation units (object files).
I admit I'm having trouble formulating questions to address my problems, so please bear with me. Say I've got a shared library of utilities for working with numpy arrays. It's intended to be used in multiple extension modules and in some places that are not modules at all (e.g. C++ programs that embed python and want to manipulate arrays directly.) One of the headers in this library (call it 'util.h') includes arrayobject.h because, for example, it needs NPY_TYPES in some template definitions. Should this 'util.h' define PY_ARRAY_UNIQUE_SYMBOL? Or NO_IMPORT? It seems like the correct answers are 'no' and 'yes', but that means that any user of this header needs to be very aware of header inclusion order. For example, if they want to include 'arrayobject.h' for their own reasons *and* they want NO_IMPORT undefined, then they need to be sure to include 'util.h' after 'arrayobject.h'.
From what I can see, the problem seems to be a conflation of two sets of symbols: those influenced by the PY_ARRAY_UNIQUE_SYMBOL and NO_IMPORT macros (broadly, the API functions), those that aren't (types, enums, and so forth.) As things stand, there's no way for 'util.h' to use NPY_TYPES (part of the latter set) without affecting users of the former set. It seems easy enough to make the latter set available in their own header to help avoid these kinds of problem.
Austin
On 05/04/2010 04:38 PM, Austin Bingham wrote:
I admit I'm having trouble formulating questions to address my problems, so please bear with me.
Say I've got a shared library of utilities for working with numpy arrays. It's intended to be used in multiple extension modules and in some places that are not modules at all (e.g. C++ programs that embed python and want to manipulate arrays directly.)
One of the headers in this library (call it 'util.h') includes arrayobject.h because, for example, it needs NPY_TYPES in some template definitions. Should this 'util.h' define PY_ARRAY_UNIQUE_SYMBOL? Or NO_IMPORT? It seems like the correct answers are 'no' and 'yes', but that means that any user of this header needs to be very aware of header inclusion order. For example, if they want to include 'arrayobject.h' for their own reasons *and* they want NO_IMPORT undefined, then they need to be sure to include 'util.h' after 'arrayobject.h'.
I still don't understand why you cannot just include the header file as is (without defining any of NO_IMPORT/PY_ARRAY_UNIQUE_SYMBOL).
From what I can see, the problem seems to be a conflation of two sets of symbols: those influenced by the PY_ARRAY_UNIQUE_SYMBOL and NO_IMPORT macros (broadly, the API functions), those that aren't (types, enums, and so forth.)
numpy headers are really messy - way too many macros, etc... Fixing it without breaking API compatibility is a lot of work, though, cheers, David
I still don't understand why you cannot just include the header file as is (without defining any of NO_IMPORT/PY_ARRAY_UNIQUE_SYMBOL).
I guess the real point is that no matter what definition (or lack thereof) that I have for these macros, I still introduce header order dependencies to users of my library if I include arrayobject.h in one of my headers. Suppose I defined neither macro in my 'util.h', and that I included 'arrayobject.h'. If a user of my library did this: #include <mylib/util.h> // <-- my library's header #define PY_ARRAY_UNIQUE_SYMBOL MY_UNIQUE_SYMBOL #define NO_IMPORT #include <numpy/arrayobject.h> ... they'd likely crash. The inclusion of arrayobject.h in util.h activates the include guards in arrayobject.h, so the second inclusion has no real effect; their calls to numpy API methods would be made against garbage pointers. As a result, unless my library's user is keenly aware of what's going on, the API function pointers will not get set properly. In this case, of course, reordering the includes will probably fix the issue. But it's a classic example of an unhygienic header, and I think we can avoid this very easily (see below).
numpy headers are really messy - way too many macros, etc... Fixing it without breaking API compatibility is a lot of work, though,
That may be true in general, but it looks like there might be a simple solution in this case. In my copy of numpy (1.3.0), I've moved everything in ndarrayobject.h between the "CONFUSE_EMACS" stuff and the inclusion of "__multiarray_api.h" into a new header, nonfunc_api.h (though this is clearly a temporary name at best!). ndarrayobject.h now includes nonfunc_api.h in place of all of the removed code, and my util.h includes nonfunc_api.h instead of arrayobject.h. The result is that existing users of the numpy API are (I believe) completely unaffected. However, the new header makes it possible to include a lot of type definitions, enumerations, and all sorts of other stuff...in my thinking, everything from ndarrayobject.h that *doesn't* depend on the macros...without adding the burden of actually needing to consider the macros. FWIW, this arrangement seems to work for my projects. I haven't applied this patch, rebuilt numpy, and run the unittests, though I'd like to when I get a chance. Austin
Am Donnerstag 06 Mai 2010 08:10:35 schrieb Austin Bingham:
Suppose I defined neither macro in my 'util.h', and that I included 'arrayobject.h'. If a user of my library did this:
#include <mylib/util.h> // <-- my library's header
#define PY_ARRAY_UNIQUE_SYMBOL MY_UNIQUE_SYMBOL #define NO_IMPORT #include <numpy/arrayobject.h>
...
they'd likely crash.
Really? Wouldn't it be really easy to check for this situation, i.e. augment the inclusion guards by some "if included before, but PY_ARRAY_UNIQUE_SYMBOL/NO_IMPORT settings are different than the last time, fail and tell the user about it"? At least that would give a compile error at an earlier point in time. HTH, Hans
they'd likely crash.
Really?
I base that on the assumption that they'd not know to call import_array() in that translation unit. This seems like a reasonable assumption because, by defining the macros as such, they are strongly implying that they expect the API functions to be imported for their definition of PY_ARRAY_UNIQUE_SYMBOL in some other place. Of course, their powers of inference and patience might be very strong, in which case they'd make sure to define those pointers, but that seems like a lot to ask of users.
Wouldn't it be really easy to check for this situation, i.e. augment the inclusion guards by some "if included before, but PY_ARRAY_UNIQUE_SYMBOL/NO_IMPORT settings are different than the last time, fail and tell the user about it"?
At least that would give a compile error at an earlier point in time.
Yes, that might be easy to do, and it's probably a good idea, but it's not an argument against normalizing (to abuse a term) the headers where possible. All the complication revolves around the API function pointers; as a user of numpy, I find it a bit frustrating that I have to concern myself with those complications when what I *really* want has nothing to do with those functions. Austin
On Thu, May 6, 2010 at 3:08 AM, Austin Bingham <austin.bingham@gmail.com>wrote:
they'd likely crash.
Really?
I base that on the assumption that they'd not know to call import_array() in that translation unit. This seems like a reasonable assumption because, by defining the macros as such, they are strongly implying that they expect the API functions to be imported for their definition of PY_ARRAY_UNIQUE_SYMBOL in some other place. Of course, their powers of inference and patience might be very strong, in which case they'd make sure to define those pointers, but that seems like a lot to ask of users.
Wouldn't it be really easy to check for this situation, i.e. augment the inclusion guards by some "if included before, but PY_ARRAY_UNIQUE_SYMBOL/NO_IMPORT settings are different than the last time, fail and tell the user about it"?
At least that would give a compile error at an earlier point in time.
Yes, that might be easy to do, and it's probably a good idea, but it's not an argument against normalizing (to abuse a term) the headers where possible. All the complication revolves around the API function pointers; as a user of numpy, I find it a bit frustrating that I have to concern myself with those complications when what I *really* want has nothing to do with those functions.
Welcome to open source and the joys of backward compatibility ;) I like your idea for breaking the header up, we really do need to try working on the header situation and I think your suggestion could be helpful without breaking current usage. Chuck
On Thu, May 6, 2010 at 8:21 AM, Charles R Harris <charlesr.harris@gmail.com>wrote:
On Thu, May 6, 2010 at 3:08 AM, Austin Bingham <austin.bingham@gmail.com>wrote:
they'd likely crash.
Really?
I base that on the assumption that they'd not know to call import_array() in that translation unit. This seems like a reasonable assumption because, by defining the macros as such, they are strongly implying that they expect the API functions to be imported for their definition of PY_ARRAY_UNIQUE_SYMBOL in some other place. Of course, their powers of inference and patience might be very strong, in which case they'd make sure to define those pointers, but that seems like a lot to ask of users.
Wouldn't it be really easy to check for this situation, i.e. augment the inclusion guards by some "if included before, but PY_ARRAY_UNIQUE_SYMBOL/NO_IMPORT settings are different than the last time, fail and tell the user about it"?
At least that would give a compile error at an earlier point in time.
Yes, that might be easy to do, and it's probably a good idea, but it's not an argument against normalizing (to abuse a term) the headers where possible. All the complication revolves around the API function pointers; as a user of numpy, I find it a bit frustrating that I have to concern myself with those complications when what I *really* want has nothing to do with those functions.
Welcome to open source and the joys of backward compatibility ;) I like your idea for breaking the header up, we really do need to try working on the header situation and I think your suggestion could be helpful without breaking current usage.
Go ahead and open a ticket and provide a patch. Mark it as needs review. Chuck
participants (5)
-
Austin Bingham
-
Charles R Harris
-
David
-
David Cournapeau
-
Hans Meine