text file reformatting

Tim Chase python.list at tim.thechases.com
Sun Oct 31 15:48:09 EDT 2010


> PRJ01001 4 00100END
> PRJ01002 3 00110END
>
> I would like to pick only some columns to a new file and put them to a
> certain places (to match previous data) - definition file (def.csv)
> could be something like this:
>
> VARIABLE	FIELDSTARTS	FIELD SIZE	NEW PLACE IN NEW DATA FILE
> ProjID	;	1	;	5	;	1
> CaseID	;	6	;	3	;	10
> UselessV  ;	10	;	1	;
> Zipcode	;	12	;	5	;	15
>
> So the new datafile should look like this:
>
> PRJ01    001       00100END
> PRJ01    002       00110END


How flexible is the def.csv format?  The difficulty I see with 
your def.csv format is that it leaves undefined gaps (presumably 
to be filled in with spaces) and that you also have a blank "new 
place in new file" value.  If instead, you could specify the 
width to which you want to pad it and omit variables you don't 
want in the output, ordering the variables in the same order you 
want them in the output:

  Variable; Start; Size; Width
  ProjID; 1; 5; 10
  CaseID; 6; 3; 10
  Zipcode; 12; 5; 5
  End; 16; 3; 3

(note that I lazily use the same method to copy the END from the 
source to the destination, rather than coding specially for it) 
you could do something like this (untested)

   import csv
   f = file('def.csv', 'rb')
   f.next() # discard the header row
   r = csv.reader(f, delimiter=';')
   fields = [
     (varname, slice(int(start), int(start)+int(size)), width)
     for varname, start, size, width
     in r
     ]
   f.close()
   out = file('out.txt', 'w')
   try:
     for row in file('data.txt'):
       for varname, slc, width in fields:
         out.write(row[slc].ljust(width))
       out.write('\n')
   finally:
     out.close()

Hope that's fairly easy to follow and makes sense.  There might 
be some fence-posting errors (particularly your use of "1" as the 
initial offset, while python uses "0" as the initial offset for 
strings)

If you can't modify the def.csv format, then things are a bit 
more complex and I'd almost be tempted to write a script to try 
and convert your existing def.csv format into something simpler 
to process like what I describe.

-tkc





More information about the Python-list mailing list