[Baypiggies] upper case, lower case, and....?

Mark Voorhies mvoorhie at yahoo.com
Sat Apr 2 05:58:57 CEST 2011


On Friday, April 01, 2011 06:01:22 am Vikram K wrote:
> My problem is that i wish to distinguish the phospho serine character from
> the rest of the small case letters in the modified protein sequence shown
> above.
> Any suggestions?
> 

Is this a display question or a data management question?  I.e., are you marking
up the protein for internal tracking or for showing different features to a user?

For data management, it would be better to track features in a separate data
member (e.g., a vector of indices of phosphorylated positions, a vector of (start,stop)
tuples to indicate features of interest, etc.).  Good formats for serializing these
annotations on disk are GFF (http://www.sanger.ac.uk/resources/software/gff/spec.html
or Lincoln Stein's recent GFFv3 revision: http://www.sequenceontology.org/gff3.shtml)
or the feature table format that GenBank uses, which can handle things like discontinuous
features or features occurring between two residues 
(http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html).  If you
don't want to implement your own parsers/classes for these formats, you can use the
existing code in BioPython.

For display (if you are limited to fixed width text), it is useful to alternate sequence lines 
with annotation lines, e.g.:

Phosphorylation                              P
Peptide                             *************
Sequence          PEGKWLGRTARGSYGYIKTTAVEIDYDSLKRKKNSLNAVPPRLVEDDQDVYDDVAEQ

BLAST, HHMer, and CLUSTALW output are all good examples of this strategy.

HTH,

Mark


More information about the Baypiggies mailing list