![]() |
![]() |
![]() |
MCATSCAN User Manual MCATSCAN is software developed in the Emili Laboratory at the University of Toronto that is designed to automate de novo (ie. database-independent) sequencing of peptides using the MCAT (Mass Coded Abundance Tagging) approach. MCAT relies on differential guanidylation of C-terminal lysine residues (typically found on trypsin generated peptides) as a means to distinguish the b- and y-ion fragments in corresponding pairs of peptide MS/MS spectra, thus enabling sequence determination for those peptides. MCATSCAN compares two input MS/MS files (one derived from an unmodified peptide and the other from the corresponding MCAT-modified form of the same peptide) and identifies common peaks, which represent potential b-ions, as well as peaks offset by the mass of the MCAT reagent (42 amu, or some factor thereof), which represent potential y-ions. These flagged ion series are then used to predict the most likely amino acid sequence of the peptide based on correspondence of the peak-to-peak delta masses to amino acid masses. The use of redundant b- and y-ion series information serves to validate and refine a partial or complete predicted sequence for a given peptide. The current version of MCATSCAN (ver 1.0) aims to demonstrate a proof-of-principle and work on improved versions (more user-friendly etc) is ongoing. A high-throughput version of the program, MCATSCAN Express, which is linked to the Emili lab TWINPEAKS file management program, can be used to automatically search large LC-MS datasets for mass spectra representing peptides that are differentially labeled with MCAT. MCATSCAN Express can process 1000's of MS/MS spectra in a single LC-MS run and is available for download from the Emili lab website (www.utoronto.ca/emililab/links/software). User requirements The program is written in Perl and will execute, as packaged, on Unix- or Linux-based platforms or on Mac computers operating on OS10. For Windows platforms, the standard Perl module can be downloaded from www.cpan.org (Perl5.6.1.629-MSWin32x86multi-thread) and installed prior to launching MCATSCAN from the command line. Implementation of the current version of MCATSCAN Express requires preinstallation of EXTRACTMS, a program written by Jimmy Eng (Institute of Systems Biology, Seattle, WA; www.systemsbiology.org). Input and output The program has been initially developed for use with tandem mass spectral files recorded in the .dta format. Datafiles used in beta testing were obtained on a ThermoFinnigan LCQ DECA ion trap mass spectrometer running XCALIBUR, although MS/MS datafiles obtained using the MCAT methodology on other instruments may be used. Text files can also be used as input, allowing editing of files to remove select peaks (for instance, precursor ion peaks). An example of the .dta format is shown, with the top line giving presursor ion mass and precursor ion charge (separated by a space). All subsequent lines describe peaks on the spectrum (recorded m/z - SPACE - intensity value). So, the first five peaks are shown here: 1838.33 2 The output is a text file (MCATreport.txt) summarizing the top-scoring peptide sequence candidates. An example is shown here: ______________________________________ HIGH-STRINGENCY PEAKS: (297.5, 411.0, 524.3, 698.4, 753.2, 840.4, 994.3, 1012.5, 1211.6, 1374.5) Best Match: I|L-I|L-QT-S-NG CONSENSUS PEAKS Y: (297.5, 411.0, 524.3, 627.2, 652.1, 753.2, 840.4, 937.2, 956.2, 1012.5, 1211.6, 1374.5) Best Match: I|L-I|L-Q-T-S-N-G-A-A-Y-F-AG-K
(280.2, 297.5, 366.0, 375.1, 385.3, 386.2, 411.0, 430.2, 438.1, 449.7, 479.2, 497.9, 506.5, 524.2, 539.3, 556.3, 578.3, 600.2, 600.3, 616.2, 627.2, 634.3, 652.1, 692.0, 698.4, 708.4, 735.3, 753.2, 769.7, 784.1, 804.3, 822.4, 828.6, 840.3, 858.8, 859.2, 876.5, 882.7, 890.5, 891.2, 906.5, 919.5, 937.2, 956.2, 980.3, 994.3, 1012.5, 1045.9, 1053.0, 1069.4, 1076.3, 1085.2, 1101.3, 1122.3, 1127.5, 1144.4, 1166.8, 1211.4, 1240.1, 1254.3, 1271.8, 1272.2, 1339.4, 1365.7, 1365.8, 1430.8) ----------------------------------------------------------------------- Operation The program is executed by placing the following three files in a single directory: 1. An unmodified peptide MS/MS spectrum file (unmod.txt) in .dta or .txt format. In manual operation mode (input spectra not linked to TWINPEAKS program), the .dta file should be renamed as unmod.txt. No other changes are necessary. To run MCATSCAN, type: perl mcatscan.pl in a DOS command module (Windows) or a shell module (Unix/Linux). On some computers, the extension '.pl' is recognized as a Perl program and the user can directly click on the icon to run the program. The resulting output file, MCATreport.txt, will appear in the same directory. Variables (parameters) In the current release of the program (ver 1.0), there are no command line options so all parameter changes must be directly edited in the perl code by the user - simply make the changes to the code and save the file before running it. For each pair of modified and unmodified spectra, MCATSCAN does the following: 1. Candidate b- and y-ions are found by searching both spectra for matched (unchanged) and offset (42, 21, 14 mass units) ion peaks. Matches are scored using a variable deltamass tolerance window, $daughter_tol, preset by the user (default = 2 m/z). Matched peaks are then averaged between the two spectra. 2. From these candidate b- and y-ion peaks, b- and y-ion series are predicted based on:
where n is the number of ion peaks, Ob and Oy the "observed" b- and y-ion peaks respectively, Pb and Py predicted b- and y-ion peaks based on reciprocal "observed" b- and y-ion peaks respectively, and M is the mass of the parent ion (adjusted for charge). 3. The sets of "observed" and "predicted" b- and y-ions are compared to generate list of peaks that most likely represents genuine b- and y-ion series. The categories are as follows: A) High-stringency peaks: These are sets of peaks in which, for every observed b-ion, there is a corresponding y-ion of expected m/z, and for every observed y-ion, a predicted b-ion is observed. Some mass spectrometry runs give peptide with fewer b- than y-ions, in which case this category will be depleted. If present, this category generally contains a very high rate of 'true positive' ions (>75%), and low rates of false positive peaks. B) Low-stringency peaks: This list contains all the observed b- and y-ions. This list normally contains all the true peaks present in the original data, but usually too many additional false-positives to calculate the peptide sequence. C) Consensus peaks: The set of peaks if constructed by weeding out false-positives from category A and replacing them with potential positives from category B. D) Potential y-ions: These are the observed y-ions. e) Potential b-ions: These are the observed b-ions. 4. The program attempts to determine the peptide sequence based on these sets of peaks. Candidate sequences are inferred by matching the deltamasses between each peak to the masses of single amino acids or double amino acid pairs. The five longest paths among these pairs are determined and returned to output. The program will only attempt to carry out this calculation if the number of iterations required is less than 100. Users with significant computing power can bypass this limitation by adjusting the code. The number of iterations can also be reduced for a given spectrum by choosing tighter tolerances for amino acid matching. 5. A report is generated summarizing these results. |
Donnelly Centre for Cellular and Biomolecular Research (CCBR) University of Toronto, 160 College St., rm 940, Toronto, Ontario, M5S 3E1 All contents copyright © 2003, The Emili Lab, University of Toronto. All rights reserved E-mail Webmaster Website designed and created by Jeff Dixon |