first, second, etc line of text file

Thu Apr 23 17:50:06 EDT 2009

Gabriel Genellina wrote:
> En Wed, 25 Jul 2007 19:14:28 -0300, James Stroud <jstroud at mbi.ucla.edu> 
> escribió:
>> Daniel Nogradi wrote:
>>> A very simple question: I currently use a cumbersome-looking way of
>>> getting the first, second, etc. line of a text file:
>> to_get = [0, 3, 7, 11, 13]
>> got = dict((i,s) for (i,s) in enumerate(open(textfile)) if i in to_get)
>> print got[3]
>> This would probably be the best way for really big files and if you know
>> all of the lines you want ahead of time.
> But it still has to read the complete file (altough it does not keep the 
> unwanted lines).
> Combining this with Paul Rubin's suggestion of itertools.islice I think 
> we get the best solution:
> got = dict((i,s) for (i,s) in 
> enumerate(islice(open(textfile),max(to_get)+1)) if i in to_get)

or even faster:
     wanted = set([0, 3, 7, 11, 13])
     with open(textfile) as src:
         got = dict((i, s) for (i, s) in enumerate(islice(src,
                                         min(wanted), max(wanted) + 1))
                    if i in wanted)
Of course that could just as efficiently create a list as a dict.
Note that using a list rather than a set for wanted takes len(wanted)
comparisons on misses, and len(wanted)/2 on hits, but most likely a
single comparison for a dict whether it is a hit or a miss.

--Scott David Daniels
Scott.Daniels at Acm.Org