Is there any way to minimize str()/unicode() objects memory usage [Python 2.6.4] ?
dmtr
dchichkov at gmail.com
Sat Aug 7 03:30:35 EDT 2010
On Aug 6, 11:50 pm, Peter Otten <__pete... at web.de> wrote:
> I don't know to what extent it still applys but switching off cyclic garbage
> collection with
>
> import gc
> gc.disable()
Haven't tried it on the real dataset. On the synthetic test it (and
sys.setcheckinterval(100000)) gave ~2% speedup and no change in memory
usage. Not significant. I'll try it on the real dataset though.
> while building large datastructures used to speed up things significantly.
> That's what I would try first with your real data.
>
> Encoding your unicode strings as UTF-8 could save some memory.
Yes... In fact that's what I'm trying now... .encode('utf-8')
definitely creates some clutter in the code, but I guess I can
subclass dict... And it does saves memory! A lot of it. Seems to be a
bit faster too....
> When your integers fit into two bytes, say, you can use an array.array()
> instead of the tuple.
Excellent idea. Thanks! And it seems to work too, at least for the
test code. Here are some benchmarks (x86 desktop):
Unicode key / tuple:
>>> for i in xrange(0, 1000000): d[unicode(i)] = (i, i+1, i+2, i+3, i+4, i+5, i+6)
1000000 keys, ['VmPeak:\t 224704 kB', 'VmSize:\t 224704 kB'],
4.079240 seconds, 245143.698209 keys per second
>>> for i in xrange(0, 1000000): d[unicode(i).encode('utf-8')] = array.array('i', (i, i+1, i+2, i+3, i+4, i+5, i+6))
1000000 keys, ['VmPeak:\t 201440 kB', 'VmSize:\t 201440 kB'],
4.985136 seconds, 200596.331486 keys per second
>>> for i in xrange(0, 1000000): d[unicode(i).encode('utf-8')] = (i, i+1, i+2, i+3, i+4, i+5, i+6)
1000000 keys, ['VmPeak:\t 125652 kB', 'VmSize:\t 125652 kB'],
3.572301 seconds, 279931.625282 keys per second
Almost halved the memory usage. And faster too. Nice.
-- Dmitry
More information about the Python-list
mailing list