[Tutor] parsing a "chunked" text file

Christian Witts cwitts at compuscan.co.za
Tue Mar 2 13:08:28 CET 2010


Andrew Fithian wrote:
> Hi tutor,
>
> I have a large text file that has chunks of data like this:
>
> headerA n1
> line 1
> line 2
> ...
> line n1
> headerB n2
> line 1
> line 2
> ...
> line n2
>
> Where each chunk is a header and the lines that follow it (up to the 
> next header). A header has the number of lines in the chunk as its 
> second field.
>
> I would like to turn this file into a dictionary like:
> dict = {'headerA':[line 1, line 2, ... , line n1], 'headerB':[line1, 
> line 2, ... , line n2]}
>
> Is there a way to do this with a dictionary comprehension or do I have 
> to iterate over the file with a "while 1" loop?
>
> -Drew
> ------------------------------------------------------------------------
>
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>   

A solution that could work for you could be something like...

dict([(z.splitlines()[0].split()[0],z.splitlines()[1:]) for z in [x for 
x in open(filename).read().split('header') if x.strip()]])

{'A': ['line 1', 'line 2', '...', 'line n1'], 'B': ['line 1', 'line 2', 
'...', 'line n2']}

Of course that doesn't look very pretty and only works for a specific 
case as demonstrated on your sample data.

-- 
Kind Regards,
Christian Witts
Business Intelligence

C o m p u s c a n | Confidence in Credit

Telephone: +27 21 888 6000
National Cell Centre: 0861 51 41 31
Fax: +27 21 413 2424
E-mail: cwitts at compuscan.co.za

NOTE:  This e-mail (including attachments )is subject to the disclaimer published at: http://www.compuscan.co.za/live/content.php?Item_ID=494.
If you cannot access the disclaimer, request it from email.disclaimer at compuscan.co.za or 0861 514131.

National Credit Regulator Credit Bureau Registration No. NCRCB6 




More information about the Tutor mailing list