I'm posting to this list to again generate open discussion on the problem in current Python that an int is used in both the Python sequence protocol and the Python buffer protocol.
The problem is that a C-int is typically only 4 bytes long while there are many applications (mmap for example), that would like to access sequences much larger than can be addressed with 32 bits. There are two aspects to this problem:
1) Some 64-bit systems still define an C-int as 4 bytes long (so even in-memory sequence objects could not be addressed using the sequence protocol).
2) Even 32-bit systems have occasion to sequence a more abstract object (perhaps it is not all in memory) which requires more than 32 bits to address.
These are the solutions I've seen:
1) Convert all C-ints to Py_LONG_LONG in the sequence and buffer protocols.
2) Add new C-API's that mirror the current ones which use Py_LONG_LONG instead of the current int.
3) Change Python to use the mapping protocol first (even for slicing) when both the mapping and sequence protocols are defined.
4) Tell writers of such large objects to not use the sequence and/or buffer protocols and instead use the mapping protocol and a different "bytes" object (that currently they would have to implement themselves and ignore the buffer protocol C-API).
What is the opinion of people on this list about how to fix the problem. I believe Martin was looking at the problem and had told Perry Greenfield he was "fixing it." Apparently at the recent PyCon, Perry and he talked and Martin said the problem is harder than he had initially thought. It would be good to document what some of this problems are so that the community can assist in fixing this problem.