[Python-3000] Should len() clip to sys.maxsize or raiseOverflowError?

Guido van Rossum guido at python.org
Tue Sep 2 22:53:00 CEST 2008


On Tue, Sep 2, 2008 at 1:35 PM, Raymond Hettinger <python at rcn.com> wrote:
>>> That makes sense to me and there a probably plenty of examples.
>>> However, I worry more about other examples that will fail
>>> and do so it a way that is nearly impossible to find through
>>> code review (because the code IS correct as written).
>>>
>>>  n = len(log_entries)
>>>  if log_entries[n] in handled:
>>
>> This should raise an IndexError. I think you meant something else?
>>
>>>    log_entries.pop(n)
>
> Right. It should have been n-1 in my quick example.

And why not -1? That doesn't have the clipping problem.

> The idea is that if the len() return value is actually being use for
> something
> (in this case indexing, but possibly also slicing, resource managment, etc),
> then the app will silently start doing the wrong thing.
>
>  next_ticket_number = len(tickets)
>  create_new_ticket_form(time(), next_ticket_number)
>
> ISTM, there are many uses for len() when it is bad news if the result
> is less than the real length.  Those cases will be harder to detect and
> correct than if an overflow was raised.

I'm sorry, but toy examples like these don't convince me. Most of them
sound like they are likely using real lists.

But for *real* lists, and for anything that actually store an item (no
matter how small -- could be a reference to None) for each valid index
value, there is no possibility that __len__() will ever overlow, since
the clipping limit is half the memory size, while the theoretical
number of references that can be stored in memory is either 1/4th or
1/8th of the memory size depending on pointer size.

The only time when __len__ can be larger than sys.maxsize is when the
class implements some kind of virtual space where the values are
computed on the fly. In such cases trying to walk over all values is
bound to take forever, and the length is likely not of all that much
interest to the caller -- but sometimes we may need to pass such an
object to some library code we didn't write that is making some
trivial use of len(), like the examples I gave before.

That said, I would actually be okay with the status quo (which does
raise an OverflowError) as long as we commit to fixing this properly
in 2.7 / 3.1, by removing the range restriction (like we've done for
other int operations a long time ago).

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)


More information about the Python-3000 mailing list