[Tutor] vcf_files and strings

Wed Oct 5 21:29:04 CEST 2011

Hi,

I'm a beginner at Python and would really like some help in how to extract information from a vcf file. 

The attached file consists of a lot of information on mutations, this one though is just 2 rows and 10 columns (the real one has a lot more rows). 

I want to extract the mRNA ID only if the mutation is missense. These two rows (mutations) that I have attached happens to be missense but how do I say that I'm not interested in the mutations that's not missense (they might be e.g. synonymous). Also, how do I say that if a mutation starts with a # symbol I don't want to include it (sometimes the chr starts with a hash).

vcf file: 2 rows, 10 columns. 

col 0                         col 1            col 2                  col 3              col 4      col5            col6                       col7                                     col8                     col9
chromosome          position           .                  Reference       ALT      position          .          some statistics and the ID:s         not important        not important

The important column is 7 where the ID is, i.e. refseq.functionalClass=missense. It's a missense mutation, so then I want to extract refseq.name=NM_003137492, or I want to extract only the ID, which in this case is NM_003137492. 

Then I want to do exactly the same thing for all the other mutations, but only for the missense mutations not the other ones. How do I accomplish that? Where do I start? 

Best,
Anna

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20111005/095e3d49/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: vcf_file.vcf
Type: text/directory
Size: 1368 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/tutor/attachments/20111005/095e3d49/attachment.bin>