"".join(string_generator()) fails to be magic
dstromberglists at gmail.com
Thu Oct 11 19:36:06 CEST 2007
On Thu, 11 Oct 2007 01:26:04 -0500, Matt Mackal wrote:
> I have an application that occassionally is called upon to process
> strings that are a substantial portion of the size of memory. For
> various reasons, the resultant strings must fit completely in RAM.
> Occassionally, I need to join some large strings to build some even
> larger strings.
> Unfortunately, there's no good way of doing this without using 2x the
> amount of memory as the result. You can get most of the way there with
> things like cStringIO or mmap objects, but when you want to actually
> get the result as a Python string, you run into the copy again.
> Thus, it would be nice if there was a way to join the output of a
> string generator so that I didn't need to keep the partial strings in
> memory. <subject> would be the obvious way to do this, but it of
> course converts the generator output to a list first.
> "Love the dolphins," she advised him. "Write by W.A.S.T.E.."
Some options you might evaluate (I'm -not- guaranteeing all of these'll
work "as advertised"):
1) Add some swap space to your machine and use standard python strings
2) Use mmap. I may be wrong, and I know you mentioned mmap, but I suspect
that mmap won't use up VM equal to the size of an mmap'd file; I suspect
it just caches portions of the data in physical memory when it's
convenient to do so with the primary copy of the data residing on disk in
3) Use ctypes, and stay in ctypes - don't convert them to python str's.
Of course, then you're basically writing a C program using the python
4) Use temporary files via the usual file API
5) If you can live with alpha code, you might try the python 3 alpha and
use the mutable "bytes" type, and stay in the "bytes" type - don't convert
it to a str
More information about the Python-list