[Python-3000] int-long unification

Marcin 'Qrczak' Kowalczyk qrczak at knm.org.pl
Sun Aug 20 23:06:28 CEST 2006


"Guido van Rossum" <guido at python.org> writes:

> The fatal error strikes me as unpleasant. Perhaps PyInt_Check[Exact]
> should return false if the value won't fit in a C long?

Maybe.

> Or perhaps we could just return -sys.maxint-1?

This would be a bad idea: some errors in use programs would yield
nonsensical results or be masked instead of being signalled with
exceptions.

I made C macros for the following patterns of extracting C integers
from my language:

1. If the object is an integer with its value in the given range,
   put the value into a C integer variable. Otherwise fail with an
   exception which tells that the value is out of range (includes
   the value, the range, and a string explaining what does this
   value represent), or that is not an integer.

2. As above, but the range is the full range of the C type.

3. As above, but the low end is 0 or given explicitly and the high end
   is the range of the C type.

Only in rare cases I needed to separate checking whether the number is
in the given range, and extracting the value under the assumption that
it has been checked earlier. Sometimes the action performed for out of
range is different than throwing an exception, but this is rare too.

The C type can be smaller or larger than the threshold which separates
the representations of small integers and big integers in my runtime
(which in my case is 1 bit smaller than some C type, so it never
matches exactly). This is handled transparently by these C macros.

I always try to find out the maximum sensible range of the given
parameter. For example:

- bzip2, compression parameters (verbosity 0..4, compression level 1..9,
  work factor 1..250), gzip similarly - case 1
- Python's unichr(): character code 0..0x10FFFF - case 1
- conversions int<->str, base 2..36 - case 1
- seeking into files - cases 2 and 3
- curses, color pair number 0..PAIR_NUMBER(A_COLOR) - case 1
- curses, screen coordinates and character counts - case 3
- curses, KEY_F(n) 0..63 - case 1
- sockets, address family code 0..AF_MAX or 0..255 - case 1
- sockets, port number 0..65535 - case 1
- sockets, socket type code and protocol number - case 3
- readline, function code in keymap 0..255 (or 0..KEYMAP_SIZE-2,
  but KEYMAP_SIZE is always 257) - case 1
- readline, repetition count of commands - case 2
- readline, rl_display_match_list, screen width 0..INT_MAX-2 - case 1
- readline, history entry positions - case 3
- readline, terminal width & height - case 3
- kill() and waitpid(), pid - case 3 (starting from 1 for an
  individual process or 2 for process group)
- kill(), signal number 0..NSIG-1 or 0.._NSIG-1 or 0..32 - case 1

The effect when writing a C extension is that the same C code works
no matter what is the relation between ranges of the target C type
and int / size_t. Python had to code extraction of the seeking offset
specially because off_t may be larger, and silently assumes that the
sensible ranges of pid_t, uid_t etc. are the same as of C int.

The visible effect is that Python has inconsistent exceptions:
>>> unichr(0x123456)
ValueError: unichr() arg not in range(0x110000) (wide Python build)
>>> unichr(0x1234567890)
OverflowError: long int too large to convert to int

Kogut is consistent here:
> Char 0x123456
Value out of range: character code must be between 0 and 1114111, but 1193046 was given
> Char 0x1234567890
Value out of range: character code must be between 0 and 1114111, but 78187493520 was given

Python:
>>> posix.kill(0, 128)
OSError: [Errno 22] Invalid argument
>>> posix.kill(0, 2**32)
OverflowError: long int too large to convert to int

Kogut:
> SignalProcess #group (SystemSignal 128)
Value out of range: signal number must be between 0 and 64, but 128 was given
> SignalProcess #group (SystemSignal (2 %Power 32))
Value out of range: signal number must be between 0 and 64, but 4294967296 was given

The same applies in the other direction, converting from C.

C in Python:
#ifdef HAVE_LARGEFILE_SUPPORT
        PyStructSequence_SET_ITEM(v, 1,
                                  PyLong_FromLongLong((PY_LONG_LONG)st.st_ino));
#else
        PyStructSequence_SET_ITEM(v, 1, PyInt_FromLong((long)st.st_ino));
#endif

C in Kogut:
   KO_INT(ko_value_of_file_status(this)->st_ino)
This is a C expression returning the equivalent of PyObject *,
taking sizeof the argument into account.

-- 
   __("<         Marcin Kowalczyk
   \__/       qrczak at knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/


More information about the Python-3000 mailing list