[Numpy-discussion] Request for review: dynamic_cpu_branch

Fri Dec 26 12:38:37 EST 2008

On Fri, Dec 26, 2008 at 2:20 AM, David Cournapeau <cournape at gmail.com>wrote:

> On Fri, Dec 26, 2008 at 4:05 PM, Charles R Harris
> <charlesr.harris at gmail.com> wrote:
> >
> >
> > On Thu, Dec 25, 2008 at 10:47 PM, David Cournapeau <cournape at gmail.com>
> > wrote:
> >>
> >> On Tue, Dec 23, 2008 at 6:07 PM, Charles R Harris
> >> <charlesr.harris at gmail.com> wrote:
> >> >
> >> >
> >> > On Mon, Dec 22, 2008 at 11:23 PM, David Cournapeau
> >> > <david at ar.media.kyoto-u.ac.jp> wrote:
> >> >>
> >> >> David Cournapeau wrote:
> >> >> > Charles R Harris wrote:
> >> >> >
> >> >> >> On Mon, Dec 22, 2008 at 10:40 PM, David Cournapeau
> >> >> >> <cournape at gmail.com
> >> >> >> <mailto:cournape at gmail.com>> wrote:
> >> >> >>
> >> >> >>     On Tue, Dec 23, 2008 at 2:35 PM, Robert Kern
> >> >> >>     <robert.kern at gmail.com <mailto:robert.kern at gmail.com>> wrote:
> >> >> >>
> >> >> >>     >
> >> >> >>     > I think he meant that it can be discovered at runtime in
> >> >> >>     general, not
> >> >> >>     > at numpy-run-time, so we can write a small C program that
> can
> >> >> >> be
> >> >> >> run
> >> >> >>     > at numpy-build-time to add another entry to config.h.
> >> >> >>
> >> >> >>     But then we only move the problem: people who want to build
> >> >> >> universal
> >> >> >>     numpy extensions will have the wrong value, no ? The
> fundamental
> >> >> >> point
> >> >> >>     of my patch is that the value is set whenever ndarrayobject.h
> is
> >> >> >>     included. So even if I build numpy on PPC, NPY_BIGENDIAN will
> >> >> >> not
> >> >> >> be
> >> >> >>     defined when the header is included for a file build with gcc
> >> >> >> -arch
> >> >> >>     i386.
> >> >> >>
> >> >> >>
> >> >> >> We can probably set things up so the determination is at run time
> --
> >> >> >> but we need to be sure that the ABI isn't affected. I did that
> once
> >> >> >> for an old project that needed data portability. In any case, it
> >> >> >> sounds like a project for a later release.
> >> >> >>
> >> >> >
> >> >> > It cannot work for numpy without breaking backward compatibility,
> >> >> > because of the following lines:
> >> >> >
> >> >>
> >> >> Actually, you could, by making the macro point to actual functions,
> but
> >> >> that would add function call cost. I don't know if the function call
> >> >> cost is significant or not in the cases where those macro are used,
> >> >
> >> > Exactly. Function calls are pretty cheap on modern hardware with good
> >> > compilers, nor would I expect the calls to be the bottleneck in most
> >> > applications. The functions would need to be visible to third party
> >> > applications, however...
> >>
> >> Would it be a problem ? Adding "true" functions to the array api,
> >> while keeping the macro for backward compatibility should be ok, no ?
> >
> > I don't think it's a problem, just that the macros generate code that is
> > compiled, so they need to call an api function. A decent compiler will
> > probably load the function pointer somewhere fast if it is called in a
> loop,
> > a const keyword somewhere will help with that. We might want something
> more
> > convenient for our own code.
> >
> >>
> >> I also updated my patch, with another function PyArray_GetEndianness
> >> which detects the runtime endianness (using an union int/char[4]). The
> >> point is to detect any mismatch between the configuration endianness
> >> and the "true" one, and I put the detection in import_array. The
> >> function is in the numpy array API, but it does not really need to be
> >> either .
> >
> > That sounds like a good start. It might be a good idea to use something
> like
> > npy_int32 instead of a plain old integer. Likewise, it would probably be
> > good to define the union as an anonymous constant. Hmm...
> > something like:
>
> What I did was a bit more heavyweight, because I added it as function,
> but the idea is similar:
>
> static int compute_endianness()
> {
>        union {
>                char c[4];
>                npy_uint32 i;
>        } bint;
>        int st;
>
>        bint.i = 'ABCD';
>
>        st = bintstrcmp(bint.c, "ABCD");
>        if (st == 0) {
>                return NPY_CPU_BIG;
>        }
>        st = bintstrcmp(bint.c, "DCBA");
>        if (st == 0) {
>                return NPY_CPU_LITTLE;
>        }
>        return NPY_CPU_UNKNOWN_ENDIAN;
> }
>
> Now that I think about it, I don't know if setting an integer to
> 'ABCD' is legal C. I think it is, but I don't claim to be any kind of
> C expert.

Try

     const  union {
               char c[4];
               npy_uint32 i;
       } bint = {1,2,3,4};

And just compare the resulting value of bint.i to predefined values, i.e.,
67305985 for big endian. The initializer is needed for the const union and
initializes the first variable. The result will be more efficient as the
union is initialized at compile time.

>
> >
> >
> > I've done it two ways here. They both require the -fno-strict-aliasing
> flag
> > in gcc, but numpy is compiled with that flag. Both methods generate the
> same
> > assembly with -O2 on my intel core2.
>
> I don't need this flag in my case, and I don't think we should require
> it if we can avoid it.
>

You can't avoid it, there will be a compile time error if you are lucky or
buggy code if you aren't.

> I also use strings comparison, not just the first character, because
> in theory, there can be middle endian, but I doubt this has any real
> use. In that case, the function bintstrcmp can be dropped.
>

Yeah, VAX used to be middle endian for floats but I think you are safe these
days.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20081226/ec19cb5d/attachment.html>