[Tutor] A CSV field is a list of integers - how to read it as such?
Dave Angel
davea at davea.name
Mon Mar 4 04:12:40 CET 2013
On 03/03/2013 09:24 PM, DoanVietTrungAtGmail wrote:
> Dear tutors
>
> I am checking out csv as a possible data structure for my records. In each
> record, some fields are an integer and some are a list of integers of
> variable length. I use csv.DictWriter to write data. When reading out using
> csv.DictReader, each row is read as a string, per the csv module's standard
> behaviour. To get these columns as lists of integers, I can think of only a
> multi-step process: first, remove the brackets enclosing the string;
> second, split the string into a list containing substrings; third, convert
> each substring into an integer. This process seems inelegant. Is there a
> better way to get integers and lists of integers from a csv file?
>
> Or, is a csv file simply not the best data structure given the above
> requirement?
Your terminology is very confusing. A csv is not a data structure, it's
a method of serializing lists of strings. Or in this case dicts of
strings. If a particular dict value isn't a string, it'll get converted
to one implicitly. csv does not handle variable length records, so this
is close to the best you're going to do.
Apart from csv, I considered using a dict or list, or using an
> object to represent each row.
Objects don't exist in a file, so they don't persist between multiple
runs of the program. Likewise dict and list. So no idea what you
really meant.
I am being attracted to csv because csv means
> serialisation is unnecessary, I just need to close and open the file to
> stop and continue later (it's a simulation experiment).
Closing and opening don't do anything to persist data, but we can guess
you must have meant to imply reading and writing as well. And you've
nicely finessed the serialization in the write step, but as you
discovered, you'll have to handle the deserialization to get back to
ints and list.
Also, I am guessing
> but haven't checked, csv is more space efficient.
More space efficient than what?
Each row contains a few
> integers plus a few lists containing hundreds of integers, and there will
> be up to hundreds of millions of rows.
>
> CODE: My Python 2.7 code is below. It doesn't have the third step
> (substring -> int).
>
> import csv
>
> record1 = {'id':1, 'type':1, 'level':1, 'ListInRecord':[2, 9]}
> record2 = {'id':2, 'type':1, 'level':1, 'ListInRecord':[1, 9]}
> record3 = {'id':3, 'type':2, 'level':1, 'ListInRecord':[2]}
> record9 = {'id':9, 'type':3, 'level':0, 'ListInRecord':[]}
> rows = [record1, record2, record3, record9]
> header = ['id', 'type', 'level', 'ListInRecord']
>
> with open('testCSV.csv', 'wb') as f:
> fCSV = csv.DictWriter(f, header)
> fCSV.writeheader()
> fCSV.writerows(rows)
>
> with open('testCSV.csv', 'r') as f:
> fCSV = csv.DictReader(f)
> for row in fCSV:
I'd add the deserialization here. For each item in row, if the
value begins and ends with [ ] then make it into a list, and if a digit
or minus-sign, make it into an int. Then for the lists, convert each
element to an int. You can use Don Jennings suggestion to save a lost
of effort here.
This should reconstruct the original recordn precisely. But it'll take
some testing to be sure.
> print 'ID=', row['id'],'ListInRecord=',
> row['ListInRecord'][1:-1].split(', ') # I want this to be a list of
> integers, NOT list of strings
>
> OUTPUT:
>
> ID= 1 ListInRecord= ['2', '9']
> ID= 2 ListInRecord= ['1', '9']
> ID= 3 ListInRecord= ['2']
> ID= 9 ListInRecord= ['']
>
--
DaveA
More information about the Tutor
mailing list