LC/MS peptide alignment and identification approach based on replicate spectral data
Liquid Chromatography/Mass Spectrometry (LC-MS) is becoming a widely-used approach for quantifying the protein composition of complex samples. The LC-MS spectra show the intensity of a peptide feature with a specific mass-charge ratio (m/z) and retention time. This technology has been used to compare complex biological samples from multiple LC-MS experiments. One challenge for comparison is to match corresponding peptide features from different LC-MS experiments. Alignment corrects for experimental variations in the chromatography, which is an important technology for the comparison of LC-MS experiments.
The corresponding feature pair is two features that are generated exactly by the same peptide in replicates. There are two key steps for corresponding feature identification: alignment and identification. Alignment gives the corresponding and non-corresponding feature pairs together and the identification step can choose the corresponding feature out of the total pairs.
Before the alignment and identification steps, it is needed to perform LC peak detection accurately. Instead of checking MS templates at the base position, the author checks the consistency of isotope patterns on the premises that peptides produce consistent isotope patterns on scans within their elution periods. After accurate elution peak detection, the author obtains the candidate elution profiles for the peptides. The author verifies the interval detection method on SILAC data. The dissertation compared several quantification method based on the accurate interval detection. The performance of H/L ratio is much better than the result from Maxquant.
Common alignment methods use warping functions to correct elution time shifts between two different LC-MS datasets to identify corresponding features (LC peaks registered by the same peptide). Although a warping function can correct the mean difference of elution time shifts, it alone cannot resolve the ambiguity in alignment completely because elution time shifts are random. Instead the author explored the R-statistic to measure the similarity in LC peak shapes between corresponding feature pairs for alignment, which means the correlation between two elution profiles.
In Super-SILAC labeled data, based on MS/MS identifications, considered that the LC peak shape is an important factor for alignment, the author proposed a Statistical Corresponding Feature Identification Algorithm (SCFIA) based on both time shifts and the similarity of LC peak shapes between corresponding features. The author tested SCFIA on publicly available datasets and compared its performance with that of warping function based methods. The accuracy and the number of detected corresponding features are improved significantly.
In 18O labeled data, as the author mentioned above, warping functions are commonly used to correct elution time shifts, which cannot resolve the ambiguity completely because elution time shifts are unpredicted. So the author takes peak shape, labeling efficiency, peptide isotope pattern and peptide predicted elution time into consideration.
The author compared the algorithm, which is not only based on elution time shift but also many other parameters, to the other software. The result shows a great improvement.