[Python-Dev] 64-bit sequence and buffer protocol

Travis Oliphant oliphant at ee.byu.edu
Tue Mar 29 23:04:23 CEST 2005


I'm posting to this list to again generate open discussion on the 
problem in current Python that an int is used in both the Python 
sequence protocol and the Python buffer protocol. 

The problem is that a C-int is typically only 4 bytes long while there 
are many applications (mmap for example), that would like to access 
sequences much larger than can be addressed with 32 bits.   There are 
two aspects to this problem:

1) Some 64-bit systems still define an C-int as 4 bytes long (so even 
in-memory sequence objects could not be addressed using the sequence 
protocol).

2) Even 32-bit systems have occasion to sequence a more abstract object 
(perhaps it is not all in memory) which requires more than 32 bits to 
address. 

These are the solutions I've seen:

1) Convert all C-ints to Py_LONG_LONG in the sequence and buffer protocols.

2) Add new C-API's that mirror the current ones which use Py_LONG_LONG 
instead of the current int.

3) Change Python to use the mapping protocol first (even for slicing) 
when both the mapping and sequence protocols are defined.

4) Tell writers of such large objects to not use the sequence and/or 
buffer protocols and instead use the mapping protocol and a different 
"bytes" object (that currently they would have to implement themselves 
and ignore the buffer protocol C-API).


What is the opinion of people on this list about how to fix the 
problem.   I believe Martin was looking at the problem and had told 
Perry Greenfield he was "fixing it."  Apparently at the recent PyCon, 
Perry and he talked and Martin said the problem is harder than he had 
initially thought.  It would be good to document what some of this 
problems are so that the community can assist in fixing this problem.

-Travis O.





More information about the Python-Dev mailing list