Processing a large string
Peter Otten
__peter__ at web.de
Fri Aug 12 04:39:38 EDT 2011
goldtech wrote:
> Hi,
>
> Say I have a very big string with a pattern like:
>
> akakksssk3dhdhdhdbddb3dkdkdkddk3dmdmdmd3dkdkdkdk3asnsn.....
>
> I want to split the sting into separate parts on the "3" and process
> each part separately. I might run into memory limitations if I use
> "split" and get a big array(?) I wondered if there's a way I could
> read (stream?) the string from start to finish and read what's
> delimited by the "3" into a variable, process the smaller string
> variable then append/build a new string with the processed data?
>
> Would I loop it and read it char by char till a "3"...? Or?
You can read the file in chunks:
from functools import partial
def read_chunks(instream, chunksize=None):
if chunksize is None:
chunksize = 2**20
return iter(partial(instream.read, chunksize), "")
def split_file(instream, delimiter, chunksize=None):
leftover = ""
chunk = None
for chunk in read_chunks(instream):
chunk = leftover + chunk
parts = chunk.split(delimiter)
leftover = parts.pop()
for part in parts:
yield part
if leftover or chunk is None or chunk.endswith(delimiter):
yield leftover
I hope I got the corner cases right.
PS: This has come up before, but I couldn't find the relevant threads...
More information about the Python-list
mailing list