Projects per year
Proteomics informed by transcriptomics (PIT), in which proteomic MS/MS spectra are searched against open reading frames derived from de novo assembled transcripts, can reveal previously unknown translated genomic elements (TGEs). However, determining which TGEs are truly novel, which are variants of known proteins, and which are simply artefacts of poor sequence assembly, is challenging. We have designed and implemented an automated solution that classifies putative TGEs by comparing to reference proteome sequences. This allows large-scale identification of sequence polymorphisms, splice isoforms and novel TGEs supported by presence or absence of variant-specific peptide evidence. Unlike previously reported methods, ours does not require a catalogue of known variants, making it more applicable to non-model organisms. The method was validated on human PIT data, then applied to Mus musculus, Pteropus alecto and Aedes aegypti. Novel discoveries included 60 human protein isoforms, 32,392¬¬¬¬¬¬¬¬¬¬ polymorphisms in P. alecto, and TGEs with non-methionine start sites including tyrosine.