[Baypiggies] Fwd: manipulating lists question

Vikram K vikthirtyfive at gmail.com
Thu Dec 5 12:06:05 CET 2013


Good catch. All the other elements remain the same except this one. Element
28 needs to be changed (in the merged/collapsed list) so that when we fuse
or merge two elements of the larger list into one then Element 28 of the
new element is (just combine whatever is present in element 28 in both the
lists keeping a ';' as delimiter):

'1302:NM_080680.2:COL11A2:TSS-UPSTREAM:UNKNOWN-INC'; '
1302:NM_080679.2:COL11A2:TSS-UPSTREAM:UNKNOWN-INC'



On Thu, Dec 5, 2013 at 5:51 AM, Martin Falatic <martin at falatic.com> wrote:

> My solution works for the first three elements as stated, but what you do
> with the rest of the elements is tricky if they differ for a given key.
>
> For 1302 all the fields in the slice [3:] match each other *except*
> element 28:
> '1302:NM_080679.2:COL11A2:TSS-UPSTREAM:UNKNOWN-INC'
> '1302:NM_080680.2:COL11A2:TSS-UPSTREAM:UNKNOWN-INC'
>
> Does this potentially happen with other elements at times? At this point
> you're faced with either discarding data or mangling data together. The
> "collapse" just takes the last [3:] slice encountered (for that remainder
> of data). Is that acceptable?
>
>  - Marty
>
>
> On Thu, December 5, 2013 02:33, Vikram K wrote:
> > In the example i have given, the second and third elements of the larger
> > list (comp[7] and comp[8]) have a 1:1 mapping after the second element.
> So
> >  i would like to keep the first element as it is and then collapse or
> > merge the second and third elements (comp[7] and comp[8]) into a single
> > element:
> >
> >
> >>>> comp[6]
> > ['6558', 'NM_001046.2', 'SLC12A2', '6037226', '2', 'chr5', '127502453',
> > '127502454', 'het-ref', 'snp', 'A', 'T', 'A', '185', '113', '184', '112',
> > 'VQHIGH', 'VQHIGH', '', '', '', '', '259974', '9', '6', '6', '15',
> > '6558:NM_001046.2:SLC12A2:CDS:MISSENSE',
> > '6558:NM_001046.2:SLC12A2:CDS:NO-CHANGE', 'PFAM:PF01490:Aa_trans', '',
> '',
> >  '', '0.99', '2', '0.99', '0.998', '1.01', '1.000', '0.5', '0.46', '0.5',
> >  '1', '18', '18', '19', 'ref-identical;onlyA', 'snp', '0.072', '-1',
> > 'SQHIGH']
> >
> >
> >>>> comp[7]
> > ['1302', 'NM_080679.2', 'COL11A2', '6525172', '2', 'chr6', '33271374',
> > '33271376', 'het-ref', 'del', 'GT', '', 'GT', '542', '542', '458', '458',
> > 'VQHIGH', 'VQHIGH', '', '', '', '', '71150', '34', '106', '106', '140',
> > '1302:NM_080679.2:COL11A2:TSS-UPSTREAM:UNKNOWN-INC',
> >
> '1302:NM_080679.2:COL11A2:TSS-UPSTREAM:UNKNOWN-INC;1302:NM_080680.2:COL11A
> > 2:TSS-UPSTREAM:UNKNOWN-INC;1302:NM_080681.2:COL11A2:TSS-UPSTREAM:UNKNOWN-
> > INC;6257:NM_021976.3:RXRB:CDS:NO-CHANGE',
> > '', '', '', '', '0.95', '2', '0.98', '0.998', '0.99', '1.000', '0.46',
> > '0.42', '0.5', '0', '102', '102', '102', 'ref-identical;onlyA', 'del',
> > '0.990', '6', 'SQHIGH']
> >
> >
> >>>> comp[8]
> > ['1302', 'NM_080680.2', 'COL11A2', '6525172', '2', 'chr6', '33271374',
> > '33271376', 'het-ref', 'del', 'GT', '', 'GT', '542', '542', '458', '458',
> > 'VQHIGH', 'VQHIGH', '', '', '', '', '71150', '34', '106', '106', '140',
> > '1302:NM_080680.2:COL11A2:TSS-UPSTREAM:UNKNOWN-INC',
> >
> '1302:NM_080679.2:COL11A2:TSS-UPSTREAM:UNKNOWN-INC;1302:NM_080680.2:COL11A
> > 2:TSS-UPSTREAM:UNKNOWN-INC;1302:NM_080681.2:COL11A2:TSS-UPSTREAM:UNKNOWN-
> > INC;6257:NM_021976.3:RXRB:CDS:NO-CHANGE',
> > '', '', '', '', '0.95', '2', '0.98', '0.998', '0.99', '1.000', '0.46',
> > '0.42', '0.5', '0', '102', '102', '102', 'ref-identical;onlyA', 'del',
> > '0.990', '6', 'SQHIGH']
> >
> >
> > After collapsing comp[7] and comp[8] i  get:
> >
> >
> >>>> collapsed = ['1302', 'NM_080679.2,NM_080680.2', 'COL11A2',
> >>>> '6525172',
> >>>>
> > '2', 'chr6', '33271374', '33271376', 'het-ref', 'del', 'GT', '', 'GT',
> > '542', '542', '458', '458', 'VQHIGH', 'VQHIGH', '', '', '', '', '71150',
> > '34', '106', '106', '140',
> > '1302:NM_080680.2:COL11A2:TSS-UPSTREAM:UNKNOWN-INC',
> >
> '1302:NM_080679.2:COL11A2:TSS-UPSTREAM:UNKNOWN-INC;1302:NM_080680.2:COL11A
> > 2:TSS-UPSTREAM:UNKNOWN-INC;1302:NM_080681.2:COL11A2:TSS-UPSTREAM:UNKNOWN-
> > INC;6257:NM_021976.3:RXRB:CDS:NO-CHANGE',
> > '', '', '', '', '0.95', '2', '0.98', '0.998', '0.99', '1.000', '0.46',
> > '0.42', '0.5', '0', '102', '102', '102', 'ref-identical;onlyA', 'del',
> > '0.990', '6', 'SQHIGH']
> >
> >
> > So in my larger list, after the modification, comp[6] is the first
> > element and collapsed the second element.
> >>>>
> >
> >
> > On Thu, Dec 5, 2013 at 5:22 AM, Martin Falatic <martin at falatic.com>
> > wrote:
> >
> >
> >> Ah, genetics! Intriguing...
> >>
> >>
> >> Do you need anything beyond the third elements of each list? Does the
> >> third element always map 1:1 with the first, or could it vary? If so,
> >> what then?
> >>
> >> To refer to the simplified example, could you have this?
> >> x = [['cat', 'NM123', 12], ['cat', 'NM234', 43], ['dog', 'NM56', 65]]
> >>
> >> If so, what is the expected output?
> >>
> >>
> >> - Marty
> >>
> >>
> >>
> >>
> >> On Thu, December 5, 2013 02:11, Vikram K wrote:
> >>
> >>> i am having some difficulty in applying this to my actual problem
> >>> although i love the dictionary method. Imagine the following three
> >>> lists are the first, second and third elements of a larger list:
> >>>
> >>>>>> comp[6]
> >>> ['6558', 'NM_001046.2', 'SLC12A2', '6037226', '2', 'chr5',
> >>> '127502453',
> >>> '127502454', 'het-ref', 'snp', 'A', 'T', 'A', '185', '113', '184',
> >>> '112',
> >>> 'VQHIGH', 'VQHIGH', '', '', '', '', '259974', '9', '6', '6', '15',
> >>> '6558:NM_001046.2:SLC12A2:CDS:MISSENSE',
> >>> '6558:NM_001046.2:SLC12A2:CDS:NO-CHANGE', 'PFAM:PF01490:Aa_trans', '',
> >>>
> >> '',
> >>
> >>> '', '0.99', '2', '0.99', '0.998', '1.01', '1.000', '0.5', '0.46',
> >>> '0.5',
> >>> '1', '18', '18', '19', 'ref-identical;onlyA', 'snp', '0.072', '-1',
> >>> 'SQHIGH']
> >>>
> >>>
> >>>
> >>>>>> comp[7]
> >>> ['1302', 'NM_080679.2', 'COL11A2', '6525172', '2', 'chr6',
> >>> '33271374',
> >>> '33271376', 'het-ref', 'del', 'GT', '', 'GT', '542', '542', '458',
> >>> '458',
> >>> 'VQHIGH', 'VQHIGH', '', '', '', '', '71150', '34', '106', '106',
> >>> '140',
> >>> '1302:NM_080679.2:COL11A2:TSS-UPSTREAM:UNKNOWN-INC',
> >>>
> >>>
> >> '1302:NM_080679.2:COL11A2:TSS-UPSTREAM:UNKNOWN-INC;1302:NM_080680.2:COL
> >> 11A
> >>
> >>> 2:TSS-UPSTREAM:UNKNOWN-INC;1302:NM_080681.2:COL11A2:TSS-UPSTREAM:UNKN
> >>> OWN-
> >>> INC;6257:NM_021976.3:RXRB:CDS:NO-CHANGE',
> >>> '', '', '', '', '0.95', '2', '0.98', '0.998', '0.99', '1.000', '0.46',
> >>>  '0.42', '0.5', '0', '102', '102', '102', 'ref-identical;onlyA',
> >>> 'del',
> >>> '0.990', '6', 'SQHIGH']
> >>>
> >>>
> >>>
> >>>>>> comp[8]
> >>> ['1302', 'NM_080680.2', 'COL11A2', '6525172', '2', 'chr6',
> >>> '33271374',
> >>> '33271376', 'het-ref', 'del', 'GT', '', 'GT', '542', '542', '458',
> >>> '458',
> >>> 'VQHIGH', 'VQHIGH', '', '', '', '', '71150', '34', '106', '106',
> >>> '140',
> >>> '1302:NM_080680.2:COL11A2:TSS-UPSTREAM:UNKNOWN-INC',
> >>>
> >>>
> >> '1302:NM_080679.2:COL11A2:TSS-UPSTREAM:UNKNOWN-INC;1302:NM_080680.2:COL
> >> 11A
> >>
> >>> 2:TSS-UPSTREAM:UNKNOWN-INC;1302:NM_080681.2:COL11A2:TSS-UPSTREAM:UNKN
> >>> OWN-
> >>> INC;6257:NM_021976.3:RXRB:CDS:NO-CHANGE',
> >>> '', '', '', '', '0.95', '2', '0.98', '0.998', '0.99', '1.000', '0.46',
> >>>  '0.42', '0.5', '0', '102', '102', '102', 'ref-identical;onlyA',
> >>> 'del',
> >>> '0.990', '6', 'SQHIGH']
> >>>
> >>>
> >>>>>>
> >>>
> >>> ------
> >>> Can we apply the dictionary method to the problem where the key of the
> >>>  dictionary is the first element of the three smaller lists
> >> ('6558','1302',
> >>
> >>> '1302'). The second and third elements of the larger list (starting
> >>> with '1302') need to be collapsed into a single element, based on
> >>> their second element ( 'NM_080679.2') and ('NM_080680.2') in a way
> >>> similar to how we had tackled the toy problem:
> >>>
> >>> x = [['cat', 'NM123', 12], ['cat', 'NM234', 12], ['dog', 'NM56', 65]]
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On Thu, Dec 5, 2013 at 4:18 AM, Michiel Overtoom <motoom at xs4all.nl>
> >>> wrote:
> >>>
> >>>
> >>>
> >>>>
> >>>> On Dec 5, 2013, at 10:09, Vikram K wrote:
> >>>>
> >>>>
> >>>>
> >>>>> another option could have been to obtain a dictionary like so:
> >>>>> {'dog':
> >>>>> ['NM56', 65], 'cat': ['NM123,NM234', 12]}
> >>>>>
> >>>>>
> >>>>
> >>>> Oh, in that case the code can become somewhat simpler:
> >>>>
> >>>>
> >>>>
> >>>> x = [['cat', 'NM123', 12], ['cat', 'NM234', 12], ['dog', 'NM56',
> >>>> 65]]
> >>>>
> >>>>
> >>>> d = {} for key, label, quant in x: if key in d: d[key][0] += ", " +
> >>>>
> >> label
> >>>> else:
> >>>> d[key] = [label, quant]
> >>>>
> >>>> print d
> >>>>
> >>>>
> >>>> I agree with Michael that the problem is somewhat underspecified,
> >>>> but it's a starting point.
> >>>>
> >>>> Greetings,
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> "If you don't know, the thing to do is not to get scared, but to
> >>>> learn." -
> >>>> Ayn Rand
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>> _______________________________________________
> >>> Baypiggies mailing list
> >>> Baypiggies at python.org
> >>> To change your subscription options or unsubscribe:
> >>> https://mail.python.org/mailman/listinfo/baypiggies
> >>>
> >>
> >>
> >>
> >
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/baypiggies/attachments/20131205/df5be96f/attachment-0001.html>


More information about the Baypiggies mailing list