[Python-Dev] Unclear on the way forward with unsigned integers

Sun, 06 Oct 2002 21:43:21 -0400

[Mark Hammond]
> I'm a little confused by the new world order for working with integers
> in extension modules.
>
> At the end of the day, my question is this:
>
> Assume my extension module has an unsigned integer it wishes to return
> to Python.  Further, assume that this unsigned integer is not really
> an integer as such, but more a set of bits, or some other value
> "encoded" in 32 bits, such as an enum. (To put it another way, the
> "signedness" of this value seems more random than chosen)

Well, it matters:  a bitset is most naturally thought of as unsigned, so
that shifting doesn't introduce "mystery bits".  OTOH, a C enum is, by
definition, a signed integer.

> How should I create the object to return to Python?

I'd create a Python long.

> For a concrete example.  Let's say I want to return the value of the
> Win32 function GetVersion().
>
> The documentation for this function declares it is an unsigned 32
> bit value.  The documentation then explains that to decode this value,
> specific bits in the value should be examined.  It then expounds on
> this with C sample code that relies on this unsigned behaviour by
> using a simple "> 0x80000000" comparison to check the high bit!

Yup.  The docs also say <wink>:

   This function has been superseded by GetVersionEx, which is the
   preferred method for obtaining system version number information.
   New applications should use GetVersionEx.  The GetVersionEx function
   was developed because many existing applications err when examining
   the DWORD return value of a GetVersion function call, transposing
   the major and minor version numbers packed into that DWORD.

IOW, the bag-of-bits mixed-with bag-of-bytes model was too confusing to work
with.

> I see 2 choices for returning this value:
>
> * Use PyInt_FromLong() - this will give me a *signed* Python integer,
>   but with an identical bit pattern.
>
> * Use PyLong_FromUnsignedLong() - this function will correctly be
>   signed, but may no longer fit in 32 bits.

Python ints don't fit in 32 bits either:  they've got object headers like
all objects have.  The space difference here is trivial.

A third choice is to pick the values apart in C, delivering a

   (NT_or_later_bool, major_version_int, minor_version_int, build_int)

tuple back to the Python user.  Making people pick apart the bits in Python
code seems too low-level here:

dwVersion = GetVersion();

// Get major and minor version numbers of Windows
dwWindowsMajorVersion =  (DWORD)(LOBYTE(LOWORD(dwVersion)));
dwWindowsMinorVersion =  (DWORD)(HIBYTE(LOWORD(dwVersion)));
// Get build numbers for Windows NT or Win32s
if (dwVersion < 0x80000000)                // Windows NT
    dwBuild = (DWORD)(HIWORD(dwVersion));
else if (dwWindowsMajorVersion < 4)        // Win32s
    dwBuild = (DWORD)(HIWORD(dwVersion) & ~0x8000);
else         // Windows 95 -- No build numbers provided
    dwBuild =  0;

> Now, I think I am trying to stay too close to the hardware for a
> language like Python, but something just seems wrong with promoting
> my nice 32 bit value to a Python long, simply for the sake of
> retaining the sign for a value that the whole concept of "signed"
> doesn't make much sense (as it doesn't in this case, or in the case
> of enums etc).

Except Python doesn't *have* unsigned ints, so the only faithful way to
return one is to make a Python long.  In this specific case, though, I think
it would be better to pick the bits apart *for* the user -- there's really
no use for the raw int, signed or unsigned, except after picking it apart.

> Any suggestions or general advice?  While this case seems quite
> trivial, I am starting to face this issue more and more, especially
> as I am seeing these lovely "FutureWarnings" from all my lovely 32
> bit hexadecimal constants <wink/frown>

Sticking "L" at the end is usually all it takes.