2-D THRESHOLDING OF THE CONNECTIVITY MAP FOLLOWING THE MULTIPLE SEQUENCE ALIGNMENTS OF DIVERSE DATASETS

Tunca Doğan, Bilge Karaçal

References

  1. [1] S.B. Needleman, & C.D. Wunsch, A general methodapplicable to the search for similarities in the amino acid sequenceof two proteins, Journal of Molecular Biology 48(3), 1970, 443–53.
  2. [2] T.F. Smith, & M.S. Waterman, Identification of CommonMolecular Subsequences, Journal of Molecular Biology 147, 1981,195–197.
  3. [3] M.A. Larkin, G. Blackshields, N.P. Brown, R. Chenna, P.A.McGettigan, H. McWilliam, F. Valentin, I.M. Wallace, A. Wilm,R. Lopez., J.D. Thompson, T.J. Gibson, & D.G. Higgins, ClustalWand ClustalX version 2, Bioinformatics 23(21), 2007, 2947-2948.
  4. [4] R.C. Edgar, MUSCLE: multiple sequence alignment withhigh accuracy and high throughput, Nucleic Acids Research 32(5),2004, 1792–97.
  5. [5] C. Notredame, D.G. Higgins, J. Heringa, T-Coffee: A novelmethod for fast and accurate multiple sequence alignment, Journalof Molecular Biology 302(1), 2000, 205–217.
  6. [6] K. Katoh, & H. Toh, Recent developments in the MAFFTmultiple sequence alignment program, Briefings in Bioinformatics9(4), 2008, 286-298.
  7. [7] E.O. Brigham, The Fast Fourier Transform (New York:Prentice-Hall, 2002)
  8. [8] S.F. Altschul, et al., Gapped BLAST and PSI-BLAST: a newgeneration of protein database search programs, Nucleic AcidsResearch 25, 1997, 3389-3402.
  9. [9] T.A. Lasko, J.G. Bhagwat, K.H. Zou, & L. Ohno-Machado,The use of receiver operating characteristic curves in biomedicalinformatics, Journal of Biomedical Informatics 38(5), 2005, 404–415.
  10. [10] The Gene Ontology Consortium, Gene ontology: tool for theunification of biology, Nat. Genet., 25(1), 2000, 25-9.
  11. [11] S.D. Brown, et al., A gold standard set of mechanisticallydiverse enzyme superfamilies. Genome Biology 7, 2006, R8.
  12. [12] M. Kimura, The neutral theory of molecular evolution(Cambridge, UK: Cambridge University Press, 1983).
  13. [13] W. Bryc, The normal distribution: characterizations withapplications (Heidelberg, Germany: Springer-Verlag, 1995).
  14. [14] A. Paccanaro, et al., Spectral clustering of protein sequences,Nucleic Acids Research 34, 2006, 1571-1580.
  15. [15] T. Nepusz, et al., SCPS: a fast implementation of a spectralmethod for detecting protein families on a genome-wide scale,BMC Bioinformatics 11, 2010, 120.
  16. [16] T. Wittkop, et al., Large scale clustering of protein sequenceswith FORCE -A layout based heuristic for weighted cluster editing,BMC Bioinformatics 8, 2007, 396.
  17. [17] L. Apeltsin, et al., Improving the quality of protein similaritynetwork clustering algorithms using the network edge weightdistribution, Bioinformatics 27, 2011, 326-33.
  18. [18] J.C. Venter, et al., The sequence of the human genome,Science 291, 2001, 1304-1351.
  19. [19] The UniProt Consortium, Reorganizing the protein space atthe Universal Protein Resource (UniProt), Nucleic Acids Research40, 2012, 71-75.
  20. [20] A.J. Enright, et al., An efficient algorithm for large-scaledetection of protein families, Nucleic Acids Research 30, 2002,1575–1584.
  21. [21] T. Wittkop, et al., Partitioning biological data with transitivityclustering, Nature Methods 7, 2010, 419-20.
  22. [22] V. Miele, et al., High-quality sequence clustering guided bynetwork topology and multiple alignment likelihood,Bioinformatics 28, 2012, 1078-85.
  23. [23] R. Diestel, Graph Theory (Heidelberg, Germany: Springer-Verlag, 2010).

Important Links:

Go Back