Is there any way to minimize str()/unicode() objects memory usage [Python 2.6.4] ?

dmtr dchichkov at gmail.com
Sat Aug 7 02:27:46 EDT 2010


On Aug 6, 10:56 pm, Michael Torrie <torr... at gmail.com> wrote:
> On 08/06/2010 07:56 PM, dmtr wrote:
>
> > Ultimately a dict that can store ~20,000,000 entries: (u'short
> > string' : (int, int, int, int, int, int, int)).
>
> I think you really need a real database engine.  With the proper
> indexes, MySQL could be very fast storing and retrieving this
> information for you.  And it will use your RAM to cache as it sees fit.
>  Don't try to reinvent the wheel here.

No, I've tried. DB solutions are not even close in terms of the speed.
Processing would take weeks :(  Memcached or REDIS sort of work, but
they are still a bit on the slow side, to be a pleasure to work with.
The standard dict() container is *a lot* faster. It is also hassle
free (accepting unicode keys/etc). I just wish there was a bit more
compact dict container, optimized for large dataset and memory, not
for speed. And with the default dict() I'm also running into some kind
of nonlinear performance degradation, apparently after
10,000,000-13,000,000 keys. But I can't recreate this with a solid
test case (see  http://bugs.python.org/issue9520 ) :(

-- Dmitry



More information about the Python-list mailing list