[Tutor] using re to build dictionary

spir denis.spir at free.fr
Tue Feb 24 13:40:06 CET 2009


Le Tue, 24 Feb 2009 12:48:51 +0100,
Norman Khine <norman at khine.net> s'exprima ainsi:

> Hello,
>  From my previous post on create dictionary from csv, i have broken the 
> problem further and wanted the lists feedback if it could be done better:
> 
>  >>> s = 'Association of British Travel Agents (ABTA) No. 56542\nAir 
> Travel Organisation Licence (ATOL)\nAppointed Agents of IATA 
> (IATA)\nIncentive Travel & Meet. Association (ITMA)'
>  >>> licences = re.split("\n+", s)
>  >>> licence_list = [re.split("\((\w+)\)", licence) for licence in licences]
>  >>> association = []
>  >>> for x in licence_list:
> ...     for y in x:
> ...         if y.isupper():
> ...            association.append(y)
> ...
>  >>> association
> ['ABTA', 'ATOL', 'IATA', 'ITMA']
> 
> 
> In my string 's', I have 'No. 56542', how would I extract the '56542' 
> and map it against the 'ABTA' so that I can have a dictionary for example:
> 
>  >>> my_dictionary = {'ABTA': '56542', 'ATOL': '', 'IATA': '', 'ITMA': ''}
>  >>>
> 
> 
> Here is what I have so far:
> 
>  >>> my_dictionary = {}
> 
>  >>> for x in licence_list:
> ...     for y in x:
> ...             if y.isupper():
> ...                     my_dictionary[y] = y
> ...
>  >>> my_dictionary
> {'ABTA': 'ABTA', 'IATA': 'IATA', 'ITMA': 'ITMA', 'ATOL': 'ATOL'}
> 
> This is wrong as the values should be the 'decimal' i.e. 56542 that is 
> in the licence_list.
> 
> here is where I miss the point as in my licence_list, not all items have 
> a code, all but one are empty, for my usecase, I still need to create 
> the dictionary so that it is in the form:
> 
>  >>> my_dictionary = {'ABTA': '56542', 'ATOL': '', 'IATA': '', 'ITMA': ''}
> 
> Any advise much appreciated.
> 
> Norman

I had a similar problem once. The nice solution was -- I think, don't take this for granted I have no time to verify -- simply using multiple group with re.findall again. Build a rule like:
	r'.+(code-pattern).+(number_pattern).+\n+'
Then the results will be a list of tuples like
[
(code1, n1),
(code2, n2),
...
]
where some numbers will be missing. from this it's straightforward to instantiate a dict, maybe using a default None value for n/a numbers. Someone will probably infirm or confirm this method.

denis
------
la vita e estrany


More information about the Tutor mailing list