Performance: sets vs dicts.

Terry Reedy tjreedy at udel.edu
Wed Sep 1 16:54:09 EDT 2010


On 9/1/2010 2:42 AM, Paul Rubin wrote:
> Terry Reedy<tjreedy at udel.edu>  writes:
>> Does anyone seriously think that an implementation should be rejected
>> as an implementation if it intellegently did seq[n] lookups in
>> log2(n)/31 time units for all n (as humans would do), instead of
>> stupidly taking 1 time unit for all n<  2**31 and rejecting all larger
>> values (as 32-bit CPython does)?
>
> Er, how can one handle n>  2**31 at all, in 32-bit CPython?

I am not sure of what you mean by 'handle'. Ints (longs in 2.x) are not 
limited, but indexes are. 2**31 and bigger are summarily rejected as 
impossibly too large, even though they might not actually be so these days.

 >>> s=b''
 >>> s[1]
Traceback (most recent call last):
   File "<pyshell#7>", line 1, in <module>
     s[1]
IndexError: index out of range
 >>> s[2**32]
Traceback (most recent call last):
   File "<pyshell#8>", line 1, in <module>
     s[2**32]
IndexError: cannot fit 'int' into an index-sized integer

As far as I know, this is undocumented. In any case, this means that if 
it were possible to create a byte array longer than 2**31 on an 
otherwise loaded 32-bit linux machine with 2**32 memory, then indexing 
the end elements would not be possible, which is to say, O(1) would jump 
to O(INF). I do not have such a machine to test whether
    big = open('2.01.gigabytes', 'rb').read()
executes or raises an exception. Array size limits are also not documented.

-- 
Terry Jan Reedy




More information about the Python-list mailing list