[Tutor] handling a textfile

Dave Angel davea at ieee.org
Wed Aug 19 14:03:56 CEST 2009


Alan Gauld wrote:
> <div class="moz-text-flowed" style="font-family: -moz-fixed">
> "Olli Virta" <llvirta at gmail.com> wrote
>
>> I have a textfile (job.txt) that needs modifying. The structure of 
>> this file
>> is like this:
>>
>> AAA1...
>> BBB1...
>> CCC1...
>> AAA2...
>> BBB2...
>> CCC2...
>> etc...
>> Question is how can I turn this all to a textfile (done.txt) that is 
>> suppose
>> to look like this:
>>
>> AAA1...BBB1...CCC1...
>> AAA2...BBB2...CCC2...
>
> Lots of ways to do it. The simplest is to read the variables line by 
> line,
> so, in pseudo code:
>
> while infile not empty
>     a = f.readline()
>     b = f.readline()
>     c = f.readline()
>     outfile.write("%s,%s,%s" % (a,b,c) )
>
> If the data is manageable you could read it all into a list then use list
> slicing to achieve the same
>
> data = infile.readlines()
> for start in range(len(data))[::3]:  # get every third index
>     outfile.write("%s\t%s\t%s" % tuple(data[start :start+3]) )
>
> I suspect you can do even clever things with itertools using groupby
> and such, but I'm no itertools expert - its on my list of things to 
> learn... :-)
>
> HTH,
>
Between ellipses and etc., you've managed to confuse everyone with the 
actual format of your file.

But Alan's response is the closest so far to what I think you might have 
had in mind.  The thing he seems to be missing is the treatment of newlines.

Basically your output file is just like your input file except that some 
newlines have been removed.  So the only question is what's the pattern 
of removal.  You might have a constant number of input lines per group 
(e.g. three for your present example).  If that's the case, you want to 
strip off all newlines except those in front of a multiple of (3).   So 
loop through the data array, using rstrip() on all the lines except 2, 
5, 8, ...   You can use the modulo operator (%) to decide whether an 
index has the right form.

Alternatively, you might be saying you want a newline whenever the 
prefix of the line changes.  So loop through the lines, doing the 
rstrip() unless the next line begins the same as the present one.

DaveA








More information about the Tutor mailing list