[Python-Dev] Integer representation (Was: ssize_t question: longs in header files)

Guido van Rossum guido at python.org
Tue May 30 06:00:04 CEST 2006


[Adding the py3k list; please remove python-dev in followups.]

On 5/29/06, "Martin v. Löwis" <martin at v.loewis.de> wrote:
> I thought Py3k will have a single integer type whose representation
> varies depending on the value being represented.

That's one proposal. Another is to have an abstract 'int' type with
two concrete subtypes, e.g. 'short' and 'long', corresponding to
today's int and long. At the C level the API should be unified so C
programmers are isolated from the difference (they aren't today).

> I haven't seen an actual proposal for such a type,

I'm not sure that my proposal above has ever been said out loud. I'm
also not partial; I think we may have to do an experiment to decide.

> so let me make one:
>
> struct PyInt{
>   struct PyObject ob;
>   Py_ssize_t value_or_size;
>   char is_long;
>   digit ob_digit[1];
> };
>
> If is_long is false, then value_or_size is the value (represented
> as Py_ssize_t), else the value is in ob_digit, and value_or_size
> is the size.

Nice. I guess if we store the long value in big-endian order we could
drop is_long, since the first digit of the long would always be
nonzero. This would save a byte (on average) for the longs, but it
would do nothing for the wasted space for short ints.

> PyLong_* will be synonyms for PyInt_*.

Why do we need to keep the PyLong_* APIs at all? Even at the Python
level we're not planning any backward compatibility features; at the C
level I like even more freedom to break things.

> PyInt_FromLong/AsLong will
> continue to exist; PyInt_AsLong will indicate an overflow with -1.
> Likewise, PyArg_ParseTuple "i" will continue to produce int, and
> raise an exception (OverflowError?) when the value is out of range.
>
> C code can then decide whether to parse a Python integer as
> C int, long, long long, or ssize_t.

Nice. I like the unified API and I like using Py_ssize_t instead of
long for the value; this ensures that an int can hold a pointer (if we
allow for signed pointers) and matches the native word size better on
Windows (I guess it makes no difference for any other platform, where
ssize_t and long already have the same size).

I worry about all the wasted space for alignment caused by the extra
flag byte though. That would be 4 byte per integer on 32-bit machines
(where they are currently 12 bytes) and 8 bytes on 64-bit machines
(where they are currently 24 bytes).

That's why I'd like my alternative proposal (int as ABC and two
subclasses that may remain anonymous to the Python user); it'll save
the alignment waste for short ints and will let us use a smaller int
type for the size for long ints (if we care about the latter).

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)


More information about the Python-Dev mailing list