[Tutor] R: re question on array
Peter Otten
__peter__ at web.de
Thu Oct 30 10:14:27 CET 2014
jarod_v6 at libero.it wrote:
> Dear All,
> Sorry for my bad presentation of my problem!!
> I have this tipe of input:
> A file with a long liste of gene ad the occurence for sample:
>
> gene Samples
> FUS SampleA
> TP53 SampleA
> ATF4 SampleB
> ATF3 SampleC
> ATF4 SampleD
> FUS SampleE
> RORA SampleE
> RORA SampleC
>
> WHat I want to obtain is amtrix where I have the occurence for sample.
> SampleA SampleB SampleC SampleD SampleE
> FUS 1 0 0 0 1
> TP53 1 0 0 0 0
> ATF4 0 1 1 0
> ATF3 0 0 1 0 0
> RORA 0 0 1 0
>
> In that way I count count the occurence in fast way!
>
> At the moment I only able to do the list of the rownames and the sample
> names. Unfortunately I don't know how to create this matrix.
> Cold you help me ?
> Thanks for the patience and the help
Open the file, skip the first line and convert the remaining lines into
(gene, sample) tuples. I assume that you know enough Python to do that.
Then build dict that maps (gene, sample) tuples to the number of occurences:
pivot = {
("FUS", "SampleA"): 1,
...
("RORA", "SampleC"): 1,
}
Remember to handle both the case when the tuple is already in the dict and
when it's not in the dict. (Once you did it successfully have a look at the
collections.Counter class).
Now you need the row/column labels. You can extract them from the dict with
rows = sorted(set(row for row, column in pivot)) # use set(...) to avoid
duplicates
columns = ... # something very similar
You can then print the table with
print([""] + columns)
for row in rows:
print([row] + [pivot.get((row, column), 0) for column in columns])
Use the str.format() method on the table cells to prettify the output and
you're done.
More information about the Tutor
mailing list