Memory Usage of Strings
John Gordon
gordon at panix.com
Wed Mar 16 13:51:45 EDT 2011
In <mailman.988.1300289897.1189.python-list at python.org> Amit Dev <amitdev at gmail.com> writes:
> I'm observing a strange memory usage pattern with strings. Consider
> the following session. Idea is to create a list which holds some
> strings so that cumulative characters in the list is 100MB.
> >>> l = []
> >>> for i in xrange(100000):
> ... l.append(str(i) * (1000/len(str(i))))
> This uses around 100MB of memory as expected and 'del l' will clear that.
> >>> for i in xrange(20000):
> ... l.append(str(i) * (5000/len(str(i))))
> This is using 165MB of memory. I really don't understand where the
> additional memory usage is coming from.
> If I reduce the string size, it remains high till it reaches around
> 1000. In that case it is back to 100MB usage.
I don't know anything about the internals of python storage -- overhead,
possible merging of like strings, etc. but some simple character counting
shows that these two loops do not produce the same number of characters.
The first loop produces:
Ten single-digit values of i which are repeated 1000 times for a total of
10000 characters;
Ninety two-digit values of i which are repeated 500 times for a total of
45000 characters;
Nine hundred three-digit values of i which are repeated 333 times for a
total of 299700 characters;
Nine thousand four-digit values of i which are repeated 250 times for a
total of 2250000 characters;
Ninety thousand five-digit values of i which are repeated 200 times for
a total of 18000000 characters.
All that adds up to a grand total of 20604700 characters.
Or, to condense the above long-winded text in table form:
range num digits 1000/len(str(i)) total chars
0-9 10 1 1000 10000
10-99 90 2 500 45000
100-999 900 3 333 299700
1000-9999 9000 4 250 2250000
10000-99999 90000 5 200 18000000
========
grand total chars 20604700
The second loop yields this table:
range num digits 5000/len(str(i)) total bytes
0-9 10 1 5000 50000
10-99 90 2 2500 225000
100-999 900 3 1666 1499400
1000-9999 9000 4 1250 11250000
10000-19999 10000 5 1000 10000000
========
grand total chars 23024400
The two loops do not produce the same numbers of characters, so I'm not
surprised they do not consume the same amount of storage.
P.S.: Please forgive me if I've made some basic math error somewhere.
--
John Gordon A is for Amy, who fell down the stairs
gordon at panix.com B is for Basil, assaulted by bears
-- Edward Gorey, "The Gashlycrumb Tinies"
More information about the Python-list
mailing list