[Tutor] Dictionary on data

Peter Otten __peter__ at web.de
Fri Nov 20 08:58:39 EST 2015


jarod_v6--- via Tutor wrote:

> Dear All!
> I have this  elements
> 
> In [445]: pt = line.split("\t")[9]
> 
> In [446]: pt
> Out[446]: 'gene_id "ENSG00000223972"; gene_version "5"; transcript_id
> "ENST00000456328"; transcript_version "2"; exon_number "1"; gene_name
> "DDX11L1"; gene_source "havana"; gene_biotype
> "transcribed_unprocessed_pseudogene"; transcript_name "DDX11L1-002";
> transcript_source "havana"; transcript_biotype "processed_transcript";
> exon_id "ENSE00002234944"; exon_version "1"; tag "basic";
> transcript_support_level "1";\n'
> 
> 
> and I want to create a dictionary like this
> 
> gene_id =  "ENSG00000223972"; ...
> 
> 
> I found on stack over flow this way to create a dictionary of dictionary
> (http://stackoverflow.com/questions/8550912/python-dictionary-of-dictionaries)
> # This is our sample data
> data = [("Milter", "Miller", 4), ("Milter", "Miler", 4), ("Milter",
> "Malter", 2)]
> 
> # dictionary we want for the result
> dictionary = {}
> 
> # loop that makes it work
>  for realName, falseName, position in data:
>     dictionary.setdefault(realName, {})[falseName] = position
> 
> I want to create a dictionary using   setdefault but I have difficult to
> trasform pt as list of tuple.
> 
>  data = pt.split(";")
> <ipython-input-456-300c276109c6> in <module>()
>       1 for i in data:
>       2     l = i.split()
> ----> 3     print l[0]
>       4
> 
> IndexError: list index out of range
> 
> In [457]: for i in data:
>     l = i.split()
>     print l
>    .....:
> ['gene_id', '"ENSG00000223972"']
> ['gene_version', '"5"']
> ['transcript_id', '"ENST00000456328"']
> ['transcript_version', '"2"']
> ['exon_number', '"1"']
> ['gene_name', '"DDX11L1"']
> ['gene_source', '"havana"']
> ['gene_biotype', '"transcribed_unprocessed_pseudogene"']
> ['transcript_name', '"DDX11L1-002"']
> ['transcript_source', '"havana"']
> ['transcript_biotype', '"processed_transcript"']
> ['exon_id', '"ENSE00002234944"']
> ['exon_version', '"1"']
> ['tag', '"basic"']
> ['transcript_support_level', '"1"']
> []
> 
> 
> So how can do that more elegant way?
> thanks so much!!

I don't see why you would need dict.setdefault(), you have the necessary 
pieces together:

data = pt.split(";")
pairs = (item.split() for item in data)
mydict = {item[0]: item[1].strip('"') for item in pairs if len(item) == 2}

You can protect against whitespace in the quoted strings with 
item.split(None, 1) instead of item.split(). If ";" is allowed in the quoted 
strings you have to work a little harder.





More information about the Tutor mailing list