Processing a large string
__peter__ at web.de
Fri Aug 12 10:39:38 CEST 2011
> Say I have a very big string with a pattern like:
> I want to split the sting into separate parts on the "3" and process
> each part separately. I might run into memory limitations if I use
> "split" and get a big array(?) I wondered if there's a way I could
> read (stream?) the string from start to finish and read what's
> delimited by the "3" into a variable, process the smaller string
> variable then append/build a new string with the processed data?
> Would I loop it and read it char by char till a "3"...? Or?
You can read the file in chunks:
from functools import partial
def read_chunks(instream, chunksize=None):
if chunksize is None:
chunksize = 2**20
return iter(partial(instream.read, chunksize), "")
def split_file(instream, delimiter, chunksize=None):
leftover = ""
chunk = None
for chunk in read_chunks(instream):
chunk = leftover + chunk
parts = chunk.split(delimiter)
leftover = parts.pop()
for part in parts:
if leftover or chunk is None or chunk.endswith(delimiter):
I hope I got the corner cases right.
PS: This has come up before, but I couldn't find the relevant threads...
More information about the Python-list