[Tutor] A CSV field is a list of integers - how to read it as such?

Mon Mar 4 04:12:40 CET 2013

On 03/03/2013 09:24 PM, DoanVietTrungAtGmail wrote:
> Dear tutors
>
> I am checking out csv as a possible data structure for my records. In each
> record, some fields are an integer and some are a list of integers of
> variable length. I use csv.DictWriter to write data. When reading out using
> csv.DictReader, each row is read as a string, per the csv module's standard
> behaviour. To get these columns as lists of integers, I can think of only a
> multi-step process: first, remove the brackets enclosing the string;
> second, split the string into a list containing substrings; third, convert
>   each substring into an integer. This process seems inelegant. Is there a
> better way to get integers and lists of integers from a csv file?
>
> Or, is a csv file simply not the best data structure given the above
> requirement?

Your terminology is very confusing.  A csv is not a data structure, it's 
a method of serializing lists of strings.  Or in this case dicts of 
strings.  If a particular dict value isn't a string, it'll get converted 
to one implicitly.  csv does not handle variable length records, so this 
is close to the best you're going to do.

  Apart from csv, I considered using a dict or list, or using an
> object to represent each row.

Objects don't exist in a file, so they don't persist between multiple 
runs of the program.  Likewise dict and list.  So no idea what you 
really meant.

  I am being attracted to csv because csv means
> serialisation is unnecessary, I just need to close and open the file to
> stop and continue later (it's a simulation experiment).

Closing and opening don't do anything to persist data, but we can guess 
you must have meant to imply reading and writing as well.  And you've 
nicely finessed the serialization in the write step, but as you 
discovered, you'll have to handle the deserialization to get back to 
ints and list.

  Also, I am guessing
> but haven't checked, csv is more space efficient.

More space efficient than what?

  Each row contains a few
> integers plus a few lists containing hundreds of integers, and there will
> be up to hundreds of millions of rows.
>
> CODE: My Python 2.7 code is below. It doesn't have the third step
> (substring -> int).
>
> import csv
>
> record1 = {'id':1, 'type':1, 'level':1, 'ListInRecord':[2, 9]}
> record2 = {'id':2, 'type':1, 'level':1, 'ListInRecord':[1, 9]}
> record3 = {'id':3, 'type':2, 'level':1, 'ListInRecord':[2]}
> record9 = {'id':9, 'type':3, 'level':0, 'ListInRecord':[]}
> rows = [record1, record2, record3, record9]
> header = ['id', 'type', 'level', 'ListInRecord']
>
> with open('testCSV.csv', 'wb') as f:
>      fCSV = csv.DictWriter(f, header)
>      fCSV.writeheader()
>      fCSV.writerows(rows)
>
> with open('testCSV.csv', 'r') as f:
>      fCSV = csv.DictReader(f)
>      for row in fCSV:

      I'd add the deserialization here. For each item in row, if the 
value begins and ends with [ ]  then make it into a list, and if a digit 
or minus-sign, make it into an int.  Then for the lists, convert each 
element to an int.  You can use Don Jennings suggestion to save a lost 
of effort here.

This should reconstruct  the original recordn precisely.  But it'll take 
some testing to be sure.

>          print 'ID=', row['id'],'ListInRecord=',
> row['ListInRecord'][1:-1].split(', ') # I want this to be a list of
> integers, NOT list of strings
>
> OUTPUT:
>
> ID= 1 ListInRecord= ['2', '9']
> ID= 2 ListInRecord= ['1', '9']
> ID= 3 ListInRecord= ['2']
> ID= 9 ListInRecord= ['']
>

-- 
DaveA