Memory Usage of Strings
Amit Dev
amitdev at gmail.com
Wed Mar 16 14:20:34 EDT 2011
sum(map(len, l)) => 99998200 for 1st case and 99999100 for 2nd case.
Roughly 100MB as I mentioned.
On Wed, Mar 16, 2011 at 11:21 PM, John Gordon <gordon at panix.com> wrote:
> In <mailman.988.1300289897.1189.python-list at python.org> Amit Dev <amitdev at gmail.com> writes:
>
>> I'm observing a strange memory usage pattern with strings. Consider
>> the following session. Idea is to create a list which holds some
>> strings so that cumulative characters in the list is 100MB.
>
>> >>> l = []
>> >>> for i in xrange(100000):
>> ... l.append(str(i) * (1000/len(str(i))))
>
>> This uses around 100MB of memory as expected and 'del l' will clear that.
>
>> >>> for i in xrange(20000):
>> ... l.append(str(i) * (5000/len(str(i))))
>
>> This is using 165MB of memory. I really don't understand where the
>> additional memory usage is coming from.
>
>> If I reduce the string size, it remains high till it reaches around
>> 1000. In that case it is back to 100MB usage.
>
> I don't know anything about the internals of python storage -- overhead,
> possible merging of like strings, etc. but some simple character counting
> shows that these two loops do not produce the same number of characters.
>
> The first loop produces:
>
> Ten single-digit values of i which are repeated 1000 times for a total of
> 10000 characters;
>
> Ninety two-digit values of i which are repeated 500 times for a total of
> 45000 characters;
>
> Nine hundred three-digit values of i which are repeated 333 times for a
> total of 299700 characters;
>
> Nine thousand four-digit values of i which are repeated 250 times for a
> total of 2250000 characters;
>
> Ninety thousand five-digit values of i which are repeated 200 times for
> a total of 18000000 characters.
>
> All that adds up to a grand total of 20604700 characters.
>
> Or, to condense the above long-winded text in table form:
>
> range num digits 1000/len(str(i)) total chars
> 0-9 10 1 1000 10000
> 10-99 90 2 500 45000
> 100-999 900 3 333 299700
> 1000-9999 9000 4 250 2250000
> 10000-99999 90000 5 200 18000000
> ========
> grand total chars 20604700
>
> The second loop yields this table:
>
> range num digits 5000/len(str(i)) total bytes
> 0-9 10 1 5000 50000
> 10-99 90 2 2500 225000
> 100-999 900 3 1666 1499400
> 1000-9999 9000 4 1250 11250000
> 10000-19999 10000 5 1000 10000000
> ========
> grand total chars 23024400
>
> The two loops do not produce the same numbers of characters, so I'm not
> surprised they do not consume the same amount of storage.
>
> P.S.: Please forgive me if I've made some basic math error somewhere.
>
> --
> John Gordon A is for Amy, who fell down the stairs
> gordon at panix.com B is for Basil, assaulted by bears
> -- Edward Gorey, "The Gashlycrumb Tinies"
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>
More information about the Python-list
mailing list