[Tutor] create dictionary from csv data
spir
denis.spir at free.fr
Mon Feb 23 16:31:44 CET 2009
Le Mon, 23 Feb 2009 14:41:10 +0100,
Norman Khine <norman at khine.net> s'exprima ainsi:
> Hello,
>
> I have this csv file:
>
> $ cat licences.csv
> "1","Air Travel Organisation Licence (ATOL)\n Operates Inclusive Tours (IT)"
> "2","Air Travel Organisation Licence (ATOL)\n Appointed Agents of IATA
> (IATA)"
> "3", "Association of British Travel Agents (ABTA) No. 56542\n Air Travel
> Organisation Licence (ATOL)\n Appointed Agents of IATA (IATA)\n
> Incentive Travel & Meet. Association (ITMA)"
I have the impression that the CSV module is here helpless. Yes, it parses the data, but you need only a subset of it that may be harder to extract. I would do the following (all untested):
-0- Read in the file as a single string.
> I would like to create a set of unique values for all the memberships. i.e.
>
> ATOL
> IT
> ABTA
> etc..
-1- Use re.findall with a pattern like r'\((\w+)\)' to get the company codes, then built a set out of the result list
> and also I would like to extract the No. 56542
-2- idem, with r'No. (\d+)' (maybe set is not necessary)
> and lastly I would like to map each record to the set of unique
> membership values, so that:
>
> I have a dictionary like:
>
> {0: ['1', '('ATOL', 'IT')'],
> 1: ['2','('ATOL', 'IATA')'],
> 2: ['3','('ABTA', 'ATOL', 'IATA', 'ITMA')']}
(The dict looks strange...)
-3- Now "splitlines" the string, and on each line
* read ordinal number (maybe useless actually)
* read again the codes
I dont know what your dict is worthful for, as the keys are simple ordinals. It's a masked list, actually. Unless you want instead
{['1':['ATOL', 'IT'],
'2':['ATOL', 'IATA'],
'3':['ABTA', 'ATOL', 'IATA', 'ITMA']}
But here the keys are still predictable ordinals.
denis
------
la vita e estrany
> Here is what I have so far:
>
> >>> import csv
> >>> inputFile = open(str("licences.csv"), 'r')
> >>> outputDic = {}
> >>> keyIndex = 0
> >>> fileReader = csv.reader(inputFile)
> >>> for line in fileReader:
> ... outputDic[keyIndex] = line
> ... keyIndex+=1
> ...
> >>> print outputDic
> {0: ['2', 'Air Travel Organisation Licence (ATOL) Appointed Agents of
> IATA (IATA)'], 1: ['3', ' "Association of British Travel Agents (ABTA)
> No. 56542 Air Travel'], 2: ['Organisation Licence (ATOL) Appointed
> Agents of IATA (IATA) Incentive Travel & Meet. Association (ITMA)"']}
>
> So basically I would like to keep only the data in the brackets, i.e.
> (ABTA) etc..
>
> Cheers
>
> Norman
> _______________________________________________
> Tutor maillist - Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
>
More information about the Tutor
mailing list