Processing a large string

Peter Otten __peter__ at web.de
Fri Aug 12 10:39:38 CEST 2011


goldtech wrote:

> Hi,
> 
> Say I have a very big string with a pattern like:
> 
> akakksssk3dhdhdhdbddb3dkdkdkddk3dmdmdmd3dkdkdkdk3asnsn.....
> 
> I want to split the sting into separate parts on the "3" and process
> each part separately. I might run into memory limitations if I use
> "split" and get a big array(?)  I wondered if there's a way I could
> read (stream?) the string from start to finish and read what's
> delimited by the "3" into a variable, process the smaller string
> variable then append/build a new string with the processed data?
> 
> Would I loop it and read it char by char till a "3"...? Or?

You can read the file in chunks:

from functools import partial

def read_chunks(instream, chunksize=None):
    if chunksize is None:
        chunksize = 2**20
    return iter(partial(instream.read, chunksize), "")

def split_file(instream, delimiter, chunksize=None):
    leftover = ""
    chunk = None
    for chunk in read_chunks(instream):
        chunk = leftover + chunk
        parts = chunk.split(delimiter)
        leftover = parts.pop()
        for part in parts:
            yield part
    if leftover or chunk is None or chunk.endswith(delimiter):
        yield leftover

I hope I got the corner cases right.

PS: This has come up before, but I couldn't find the relevant threads...




More information about the Python-list mailing list