MCP Danish Cancer Society
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH
 QUICK SEARCH:   [advanced]


     


A more recent version of this article appeared on February 1, 2002.
This Article
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
M100004-MCP200v1
1/2/139    most recent
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow Glossary
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Mackey, A. J.
Right arrow Articles by Pearson, W. R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Mackey, A. J.
Right arrow Articles by Pearson, W. R.
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?

Submitted on August 7, 2001
Revised on November 13, 2001
Accepted on December 12, 2001

Getting more from less: algorithms for rapid protein identification with multiple short peptide sequences

Aaron J. Mackey, Timothy A.J. Haystead, and William R. Pearson

Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA 22908

Corresponding Author: wrp{at}virginia.edu

We describe two novel sequence similarity search algorithms, FASTS and FASTF, that use multiple short peptide sequences to identify homologous sequences in protein or DNA databases. FASTS searches with peptide sequences of unknown order, as obtained by mass spectrometry-based sequencing, evaluating all possible arrangements of the peptides. FASTF searches with mixed peptide sequences, as generated by Edman sequencing of unseparated mixtures of peptides. FASTF deconvolutes the mixture, using a greedy heuristic that allows rapid identification of high scoring alignments while reducing the total number of explored alternatives. Both algorithms use the heuristic FASTA comparison strategy to accelerate the search, but use alignment probability, rather than similarity score, as the criterion for alignment optimality. Statistical estimates are calculated using an empirical correction to a theoretical probability. These calculated estimates were accurate within a factor of 10 for FASTS and 1000 for FASTF on our test dataset. FASTS requires only 15–20 total residues in three or four peptides to robustly identify homologues sharing 50% or greater protein sequence identity. FASTF requires about 25% more sequence data than FASTS for equivalent sensitivity, but additional sequence data is usually available from mixed-Edman experiments. Thus, both algorithms can identify homologues that diverged 100 to 500 million years ago, allowing proteomic identification from organisms whose genomes have not been sequenced.


Add to CiteULike CiteULike   Add to Complore Complore   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati    What's this?





HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH
 All ASBMB Journals   Journal of Biological Chemistry 
 Journal of Lipid Research   ASBMB Today 
Copyright © 2001 by the American Society for Biochemistry and Molecular Biology.