Is there any way to minimize str()/unicode() objects memory usage [Python 2.6.4] ?

Fri Aug 6 21:02:01 EDT 2010

On Fri, 06 Aug 2010 17:45:31 -0700, dmtr wrote:

> I'm running into some performance / memory bottlenecks on large lists.
> Is there any easy way to minimize/optimize memory usage?

Yes, lots of ways. For example, do you *need* large lists? Often a better 
design is to use generators and iterators to lazily generate data when 
you need it, rather than creating a large list all at once.

An optimization that sometimes may help is to intern strings, so that 
there's only a single copy of common strings rather than multiple copies 
of the same one.

Can you compress the data and use that? Without knowing what you are 
trying to do, and why, it's really difficult to advise a better way to do 
it (other than vague suggestions like "use generators instead of lists").

Very often, it is cheaper and faster to just put more memory in the 
machine than to try optimizing memory use. Memory is cheap, your time and 
effort is not.

[...]
> Well...  63 bytes per item for very short unicode strings... Is there
> any way to do better than that? Perhaps some compact unicode objects?

If you think that unicode objects are going to be *smaller* than byte 
strings, I think you're badly informed about the nature of unicode.

Python is not a low-level language, and it trades off memory compactness 
for ease of use. Python strings are high-level rich objects, not merely a 
contiguous series of bytes. If all else fails, you might have to use 
something like the array module, or even implement your own data type in 
C.

But as a general rule, as I mentioned above, the best way to minimize the 
memory used by a large list is to not use a large list. I can't emphasise 
that enough -- look into generators and iterators, and lazily handle your 
data whenever possible.

-- 
Steven