Is there any way to minimize str()/unicode() objects memory usage [Python 2.6.4] ?

Fri Aug 6 21:08:55 EDT 2010

On 08/07/2010 02:45 AM, dmtr wrote:
> I'm running into some performance / memory bottlenecks on large lists.
> Is there any easy way to minimize/optimize memory usage?
> 
> Simple str() and unicode objects() [Python 2.6.4/Linux/x86]:
>>>> sys.getsizeof('')      24 bytes
>>>> sys.getsizeof('0')    25 bytes
>>>> sys.getsizeof(u'')    28 bytes
>>>> sys.getsizeof(u'0')  32 bytes
> 
> Lists of str() and unicode() objects (see ref. code below):
>>>> [str(i) for i in xrange(0, 10000000)]   370 Mb (37 bytes/item)
>>>> [unicode(i) for i in xrange(0, 10000000)]   613 Mb (63 bytes/item)
> 
> Well...  63 bytes per item for very short unicode strings... Is there
> any way to do better than that? Perhaps some compact unicode objects?

There is a certain price you pay for having full-feature Python objects.

What are you trying to accomplish anyway? Maybe the array module can be
of some help. Or numpy?

> 
> -- Regards, Dmitry
> 
> ----
> import os, time, re
> start = time.time()
> l = [unicode(i) for i in xrange(0, 10000000)]
> dt = time.time() - start
> vm = re.findall("(VmPeak.*|VmSize.*)", open('/proc/%d/status' %
> os.getpid()).read())
> print "%d keys, %s, %f seconds, %f keys per second" % (len(l), vm, dt,
> len(l) / dt)