[Tutor] parsing a "chunked" text file
Christian Witts
cwitts at compuscan.co.za
Tue Mar 2 13:08:28 CET 2010
Andrew Fithian wrote:
> Hi tutor,
>
> I have a large text file that has chunks of data like this:
>
> headerA n1
> line 1
> line 2
> ...
> line n1
> headerB n2
> line 1
> line 2
> ...
> line n2
>
> Where each chunk is a header and the lines that follow it (up to the
> next header). A header has the number of lines in the chunk as its
> second field.
>
> I would like to turn this file into a dictionary like:
> dict = {'headerA':[line 1, line 2, ... , line n1], 'headerB':[line1,
> line 2, ... , line n2]}
>
> Is there a way to do this with a dictionary comprehension or do I have
> to iterate over the file with a "while 1" loop?
>
> -Drew
> ------------------------------------------------------------------------
>
> _______________________________________________
> Tutor maillist - Tutor at python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>
A solution that could work for you could be something like...
dict([(z.splitlines()[0].split()[0],z.splitlines()[1:]) for z in [x for
x in open(filename).read().split('header') if x.strip()]])
{'A': ['line 1', 'line 2', '...', 'line n1'], 'B': ['line 1', 'line 2',
'...', 'line n2']}
Of course that doesn't look very pretty and only works for a specific
case as demonstrated on your sample data.
--
Kind Regards,
Christian Witts
Business Intelligence
C o m p u s c a n | Confidence in Credit
Telephone: +27 21 888 6000
National Cell Centre: 0861 51 41 31
Fax: +27 21 413 2424
E-mail: cwitts at compuscan.co.za
NOTE: This e-mail (including attachments )is subject to the disclaimer published at: http://www.compuscan.co.za/live/content.php?Item_ID=494.
If you cannot access the disclaimer, request it from email.disclaimer at compuscan.co.za or 0861 514131.
National Credit Regulator Credit Bureau Registration No. NCRCB6
More information about the Tutor
mailing list