This application note describes the use of a Bayesian probability based de novo sequencing algorithm automatically applied to an entire LC-MS/MS dataset. The resultant sequences are compared to the results obtained from a databank search using the same dataset.
Large numbers of proteins from sequenced organisms can be rapidly identified from single samples in an automated manner by LC-MS/MS analysis of the tryptic digests followed by databank searching. The enhanced sensitivity, resolution and mass measurement accuracy in data produced by the Waters Micromass Q-Tof Mass Spectrometer considerably aids the process of inferring de novo amino acid sequence.
The Maximum Entropy based MassSeq (ProteinLynx Global SERVER v2.0) de novo sequencing algorithm performs the following after raw spectra have been processed to remove isotope and charge multiplicity.
All data were acquired using a Waters Capillary HPLC (CapLC) System and a Q-Tof Ultima API, hybrid quadrupole orthogonal acceleration time-of-flight mass spectrometer (www.waters.com) fitted with a NanoLockSpray source. Data Directed Analysis (DDA) was carried out on 100 fmol of Yeast Enolase tryptic digest. Ions were selected for MS/MS analysis based on their intensity and charge state. Collision energies were chosen automatically based on the m/z and charge-state of the selected precursor ions. All data were acquired with an internal reference ion in order to provide mass measurement accuracies of less than 10 ppm in both the MS and MS/MS mode of acquisition.
ProteinLynx Global SERVER 2.0 was used to define a 'Workflow' template, comprising of a post-acquisition processing routine required to reduce the raw continuum MS/MS data set to a databank-searchable form, the databank to be searched and the associated query parameters (Figure 1).
A filter was applied to the dataset to discard those spectra containing insufficient information to represent a peptide. The remaining MS/MS spectra associated with each precursor ion were combined, transposed to a single charge state and reduced to a list of accurately mass measured peaks, using the MaxEnt 3 algorithm. Precursor and product ions were automatically lockmass corrected against Glufibrinopeptide B and erythromycin respectively. All data was converted to an XML format, and searched against the SwissProt v.39 (86,865 entries, FASTA format) databank. An additional ‘Workflow’ was created, in which all spectra were de novo sequenced, with the results compared to the standard databank search. The 'Workflow' was automatically initiated upon completion of the LC-MS/MS acquisition, and the results displayed in an integrated Java interface (ProteinLynx Browser).
From 100 fmols of a Yeast Enolase tryptic digest, 59.6% sequence coverage was obtained from 25 unique peptides, identified by searching against SwissProt v.39. The RMS error of all precursor ions used for the identification of Yeast Enolase was 8.01 ppm. Table 1 shows a selection of peptides from the databank search (column 4) and the subsequent independent de novo sequencing (column 5) of the same peptides.
Two peptides de novo sequenced from the Yeast Enolase digest have been represented in Figure 2. Both peptides are doubly charged at m/z 392.22 and 580.31.
Upon de novo sequencing, the obtained sequence for ions m/z 392.22 and 580.31 were HLADLSK and LGSEVYHNLK respectively, which are consistent with the databank search results (Table 1). The four selected peptides of m/z 771.369, 523.763, 394.718 and 538.296 represent non-specific tryptic cleavages which were not identified by the standard databank search.
720000724, August 2003