[Tutor] create dictionary from csv data

spir denis.spir at free.fr
Mon Feb 23 16:31:44 CET 2009


Le Mon, 23 Feb 2009 14:41:10 +0100,
Norman Khine <norman at khine.net> s'exprima ainsi:

> Hello,
> 
> I have this csv file:
> 
> $ cat licences.csv
> "1","Air Travel Organisation Licence (ATOL)\n Operates Inclusive Tours (IT)"
> "2","Air Travel Organisation Licence (ATOL)\n Appointed Agents of IATA 
> (IATA)"
> "3", "Association of British Travel Agents (ABTA) No. 56542\n Air Travel
> Organisation Licence (ATOL)\n Appointed Agents of IATA (IATA)\n 
> Incentive Travel & Meet. Association (ITMA)"

I have the impression that the CSV module is here helpless. Yes, it parses the data, but you need only a subset of it that may be harder to extract. I would do the following (all untested):

-0- Read in the file as a single string.

> I would like to create a set of unique values for all the memberships. i.e.
> 
> ATOL
> IT
> ABTA
> etc..

-1- Use re.findall with a pattern like r'\((\w+)\)' to get the company codes, then built a set out of the result list

> and also I would like to extract the No. 56542

-2- idem, with r'No. (\d+)' (maybe set is not necessary)

> and lastly I would like to map each record to the set of unique 
> membership values, so that:
> 
> I have a dictionary like:
> 
> {0: ['1', '('ATOL', 'IT')'],
> 1: ['2','('ATOL', 'IATA')'],
> 2: ['3','('ABTA', 'ATOL', 'IATA', 'ITMA')']}

(The dict looks strange...)

-3- Now "splitlines" the string, and on each line
* read ordinal number (maybe useless actually)
* read again the codes
I dont know what your dict is worthful for, as the keys are simple ordinals. It's a masked list, actually. Unless you want instead
{['1':['ATOL', 'IT'],
'2':['ATOL', 'IATA'],
'3':['ABTA', 'ATOL', 'IATA', 'ITMA']}
But here the keys are still predictable ordinals.

denis
------
la vita e estrany

> Here is what I have so far:
> 
>  >>> import csv
>  >>> inputFile = open(str("licences.csv"),  'r')
>  >>> outputDic = {}
>  >>> keyIndex = 0
>  >>> fileReader = csv.reader(inputFile)
>  >>> for line in fileReader:
> ...     outputDic[keyIndex] = line
> ...     keyIndex+=1
> ...
>  >>> print outputDic
> {0: ['2', 'Air Travel Organisation Licence (ATOL) Appointed Agents of 
> IATA (IATA)'], 1: ['3', ' "Association of British Travel Agents (ABTA) 
> No. 56542 Air Travel'], 2: ['Organisation Licence (ATOL) Appointed 
> Agents of IATA (IATA) Incentive Travel & Meet. Association (ITMA)"']}
> 
> So basically I would like to keep only the data in the brackets, i.e. 
> (ABTA) etc..
> 
> Cheers
> 
> Norman
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
> 


More information about the Tutor mailing list