Overflow error (was Vol 67, Issue 192)

Scott David Daniels Scott.Daniels at Acm.Org
Mon Apr 13 22:13:40 CEST 2009

Dave Angel wrote:
> Ryniek90 wrote:
>>>> .... But i've still haven't got answer for question: 
>>>> "What's the max. length of string bytes which Python can hold?"
>>> sys.maxsize
>>> The largest positive integer supported by the platform’s
>>> Py_ssize_t type, and thus the maximum size lists, strings, dicts, and
>>> many other containers can have.
>> Thanks. I've wanted to check very carefully what's up, and i found 
>> this: "strings (currently restricted to 2GiB)".
>> It's here, in PEP #353 (PEP 0353 
>> <http://www.python.org/dev/peps/pep-0353/>). Besides of this, i've 
>> found in sys module's docstring this:
>> maxint = 2147483647
>> maxunicode = 1114111
>> Which when added gives us 2148597758.0 bytes, which are equal to 
>> 2049.0624980926514 MiB's.

This arithmetic makes very little sense.  You are adding the maximum
value for a unicode code point and the maximum integer represented in
the underlying C compiler's int.  Were you to do some kind of arithmetic
on those two numbers, I'd do:
     sys.maxint / math.ceil(log(sys.maxunicode, 256))
That is "supposed to be" the number of unicode characters in a
maximal-length sequence of bytes.  However, it doesn't even manage
that, as (I believe) even for those Pythons with 32-bit unicode
characters, sys.maxunicode is currently 0x10FFFF (the largest
code point defined by the UNicode consortium).

> How much RAM is in your system? Unless it's at least 50 gb, in a 64bit 
> OS, I'd keep my max chunk size to much smaller than 2gb. For a typical 
> 32bit system with 2 to 4gb of RAM, I'd probably chunk the file a meg or 
> so at a time. Using large sizes is almost always a huge waste of resources.

Agreed.  I you must do arithmetic to determine the chunk length, you
using the magic constant for practically everything, "42", can give you
the chunk size to use:
     ord("4") * ord("2") * int("42") == 109200


--Scott David Daniels
Scott.Daniels at Acm.Org

More information about the Python-list mailing list