[Tutor] Help? Making Variables out of the header row in a CSV file

Wed Jun 21 04:12:44 CEST 2006

Ralph H. Stoos Jr. wrote:
> All,
> 
> Please forgive the long question, but duty calls me to get something
> done with Python that I don't want somebody to do with Microsloth Excel or Miserable
> Basic.

It's not such a long question.

> The part I can't seem to get going has to do with making variables out
> of the header row in the CSV file.  The way it needs to work is that the
> first row may be of differing lengths (extra fields).  I want to have
> the standalone Python script to make the variables and then allow the user
> to sort the data based on the contents of the columns (i.e. pull out the
> records that match one or more criteria).
> 
> The final product would allow me to select the input file, specify some thing
> like the value of the "PPM" or "Grain" (and combinations thereof), and
> write the output to a new CSV file.  The last bit would be some math on the contents of a couple of the fields.
> 
> Here is a snippet of the data so you can see what I am dealing with.
> 
> Machine,PurgeSuccess,PurgePrepared,PurgeStarted,PurgeCauseFaultID,PurgeModId,PurgeStopModId,ModIdJobCfg,NameJobCfg,Full_cfg,FinisherCfg,plex,PPM,PropID,AtreeNodeID,JobID,MediaID,Width,Height,Color,Weight,Caliper,Drilled,Finish,Grain,CoatingFront,CoatingBack,SW,additional_info,debuglog
> 1125785731,N,Y,Y,927,6,5,_2_3_4_5_6,_SFM20_IOT7_SFM7_BFM20_BFM2,_SFM20_IOT7_SFM7_BFM20_BFM2,DUAL_BFM,Simplex,120,44366,1228,392,527,279400,431800,white,75,104,FALSE,regular,y,none,none,RV0.6.5.27,,DebugMsgLog.2006_05_24.07_48_00
> 1125785731,Y,Y,N,1003,6,,_2_3_4_5_6,_SFM20_IOT7_SFM7_BFM20_BFM2,_SFM20_IOT7_SFM7_BFM20_BFM2,DUAL_BFM,Duplex,120,69206,75,408,29,279400,431800,white,75,104,FALSE,regular,y,none,none,RV0.6.5.27,,DebugMsgLog.2006_05_31.14_33_25A

The csv module will do what you need as far as reading and writing the 
data. You can use a DictReader to read each row of data into a 
dictionary whose keys are the row names. Then you can process the rows 
as needed and write them out using csv.DictWriter.

Here is a short example:

In [1]: import csv

This is your sample data. Normally you would read from an open file, for 
the example I read from a simple list.
In [2]: data = 
'''Machine,PurgeSuccess,PurgePrepared,PurgeStarted,PurgeCauseFaultID,PurgeModId,PurgeStopModId,ModIdJobCfg,NameJobCfg,Full_cfg,Finisher
Cfg,plex,PPM,PropID,AtreeNodeID,JobID,MediaID,Width,Height,Color,Weight,Caliper,Drilled,Finish,Grain,CoatingFront,CoatingBack,SW,additional_info,debug
log
    ...: 
1125785731,N,Y,Y,927,6,5,_2_3_4_5_6,_SFM20_IOT7_SFM7_BFM20_BFM2,_SFM20_IOT7_SFM7_BFM20_BFM2,DUAL_BFM,Simplex,120,44366,1228,392,527,279400,431
800,white,75,104,FALSE,regular,y,none,none,RV0.6.5.27,,DebugMsgLog.2006_05_24.07_48_00
    ...: 
1125785731,Y,Y,N,1003,6,,_2_3_4_5_6,_SFM20_IOT7_SFM7_BFM20_BFM2,_SFM20_IOT7_SFM7_BFM20_BFM2,DUAL_BFM,Duplex,120,69206,75,408,29,279400,431800,
white,75,104,FALSE,regular,y,none,none,RV0.6.5.27,,DebugMsgLog.2006_05_31.14_33_25A
    ...:
    ...:
    ...: '''.splitlines()

Oops, got some extra blank lines in there too:
In [3]: len(data)
Out[3]: 5

Make a DictReader from the data. It will read the column names from the 
first row.
In [4]: r=csv.DictReader(data)

This reads the rows from the DictReader into a list. Each item in the 
list is a dictionary with values from one row of input.
In [5]: rows = [ row for row in r ]

Here is one of the dicts:
In [6]: rows[0]
Out[6]:
{'AtreeNodeID': '1228',
  'Caliper': '104',
  'CoatingBack': 'none',
  'CoatingFront': 'none',
  'Color': 'white',
  'Drilled': 'FALSE',
  'Finish': 'regular',
  'FinisherCfg': 'DUAL_BFM',
  'Full_cfg': '_SFM20_IOT7_SFM7_BFM20_BFM2',
  'Grain': 'y',
  'Height': '431800',
  'JobID': '392',
  'Machine': '1125785731',
  'MediaID': '527',
  'ModIdJobCfg': '_2_3_4_5_6',
  'NameJobCfg': '_SFM20_IOT7_SFM7_BFM20_BFM2',
  'PPM': '120',
  'PropID': '44366',
  'PurgeCauseFaultID': '927',
  'PurgeModId': '6',
  'PurgePrepared': 'Y',
  'PurgeStarted': 'Y',
  'PurgeStopModId': '5',
  'PurgeSuccess': 'N',
  'SW': 'RV0.6.5.27',
  'Weight': '75',
  'Width': '279400',
  'additional_info': '',
  'debuglog': 'DebugMsgLog.2006_05_24.07_48_00',
  'plex': 'Simplex'}

Now process the list of dicts as you like, for example pulling out the 
PPM values:
In [7]: [ row['PPM'] for row in rows ]
Out[7]: ['120', '120']

To allow users to write ad-hoc queries is more difficult. If you can 
restrict them to a few kinds of query it will be easier.

I hope this gets you started,
Kent