Is there any way to minimize str()/unicode() objects memory usage ?[Python 2.6.4] ?

Sat Aug 7 02:36:29 EDT 2010

dmtr <dchichkov at gmail.com> wrote:
> 
> What I'm really looking for is a dict() that maps short unicode
> strings into tuples with integers. But just having a *compact* list
> container for unicode strings would help a lot (because I could add a
> __dict__ and go from it).
> 

At this point, I'd suggest to use one of the dbm modules, and pack the
integers with struct.pack into a short string(s).
Depending on your usage pattern, there are marked performance 
differences between dbhash, gdbm, and dbm implementations, so perhaps
it would pay off to invest sometime in benchmarking.
If your data are write-once, then cdb has excellent performance (but a
different API).
The file will be usually cached in RAM, so no need to worry about I/O
bottlenecks... and if small enough, you can always put it into a
ramdisk.

If your strings are long enough, you can improve memory usage with 
a use of zlib.compress (dumb and unoptimal way of using compression, but 
easy and present in the std library) - but always verify if the
compressed strings are _shorter_ than originals.

-- 
 -----------------------------------------------------------
| Radovan Garabík http://kassiopeia.juls.savba.sk/~garabik/ |
| __..--^^^--..__    garabik @ kassiopeia.juls.savba.sk     |
 -----------------------------------------------------------
Antivirus alert: file .signature infected by signature virus.
Hi! I'm a signature virus! Copy me into your signature file to help me spread!