[Tutor] parsing a "chunked" text file

Tue Mar 2 14:04:41 CET 2010

On Mon, 1 Mar 2010 22:22:43 -0800
Andrew Fithian <afith13 at gmail.com> wrote:

> Hi tutor,
> 
> I have a large text file that has chunks of data like this:
> 
> headerA n1
> line 1
> line 2
> ...
> line n1
> headerB n2
> line 1
> line 2
> ...
> line n2
> 
> Where each chunk is a header and the lines that follow it (up to the next
> header). A header has the number of lines in the chunk as its second field.
> 
> I would like to turn this file into a dictionary like:
> dict = {'headerA':[line 1, line 2, ... , line n1], 'headerB':[line1, line 2,
> ... , line n2]}
> 
> Is there a way to do this with a dictionary comprehension or do I have to
> iterate over the file with a "while 1" loop?

The nice way would be to split the source into a list of chunk texts. But there seems to be no easy way to do this without traversing the source. If the source is generated, just add blank lines (so that the sep is '\n\n'). Then a dict comp can map items using any makeChunk() func.

If this is not doable, I would traverse lines using a "while n < s" loop, where n is current line # & and s the size of lines.

Denis
-- 
________________________________

la vita e estrany

spir.wikidot.com