Comparison of sequences, protein 3D structures and genomes




Скачать 80.37 Kb.
НазваниеComparison of sequences, protein 3D structures and genomes
страница1/5
Дата05.10.2012
Размер80.37 Kb.
ТипДокументы
  1   2   3   4   5


Comparison of sequences, protein 3D structures and genomes




László KAJÁN1, Kristian VLAHOVICEK1,2, Oliviero CARUGO1,3, Vilmos ÁGOSTON4, Zoltán HEGEDÜS4 and Sándor PONGOR1

1Protein Structure and Bioinformatics Group, International Centre for Genetic Engineering and Biotechnology, Area Science Park, 34012 Trieste, Italy

2Molecular Biology Department, Biology Division, Faculty of Science, University of Zagreb, 10000 Zagreb, Croatia

3Department of General Chemistry, Pavia University, viale Taramelli 12, 27100 Pavia, Italy

4Bioinformatics Group, Biological Research Center, Hungarian Academy of Sciences, Temesvári krt. 626726 Szeged, Hungary


Abstract. The analysis of similarity is a fundamental task in comparing sequences, three dimensional structures as well as genomes and molecular networks. This chapter reviews the common principles underlying these diverse applications.


Introduction



The basic concepts of similarity analysis – as presented in the first part of this review – provide a common framework for the classification of newly identified the protein sequence or protein 3D structure. Classification of an object implies placing it into the already existing categories or marking it as “unknown” i.e. as a potential initiator of a new category. This process usually consists of the following steps.

Recognition of similarity. This is a qualitative decision that is often based on some approximate quantitative measure. In sequence analysis, if the raw alignment score is above a threshold, the similarity is considered significant and retained for further analysis. In the case of protein 3-D structures the preliminary evaluation is often based on visual inspection.

Next, the basis of similarity, i.e. a common substructure is identified. This is carried out by matching of the equivalent entities and relationships, and sequence alignments as well as structural alignments are the best examples. Determination of matching by computers involves maximization of a similarity measure (or minimization of a distance measure), and the final value of the respective parameters is used as a numeric measure of similarity.

Evaluation of similarity. First a decision has to be made whether or not the similarity is biologically important, and the protein is either assigned to a known similarity group or it will be considered as the initiator of a new group. This decision is usually based on one or more similarity scores as well as on the alignment, but human judgment is hard to replace and at this stage.

Representation of similarity in databases. Once the similarity is established, it has to be added to the annotation of the protein in the sequence and or 3-D databases. Protein superfamilies, structural domains, orthologous groups etc. are determined by similarity analysis, and there is large number of secondary databases that are dedicated to the curation of the underlying similarity groups. Apart from narrative descriptions there are two general avenues to describe similarity groups. Cladograms are classifications that can be established using proximity measures and represent the internal structure of the similarity group. Common patterns on the other hand are usually derived from alignments and represent common substructures present in the members of the similarity group.

The above steps are not always obvious for the users. For example, sequence similarity search programs present the results corresponding to step II, while some of the 3-D similarity search servers provide only a qualitative suggestion corresponding to step I. What is apparent however that all methods include a preliminary, approximate estimation of similarity, followed by a filtering and finally an alignment step.

This section provides a brief overview of how similarity scoring in used in the comparison of sequences, protein 3-D structures and entire genomes. In these fields, similarity measures are used for database searching, for classification and for phylogenetic analysis. A comprehensive overview of these broad fields would be far beyond the scope of this chapter. Instead, we will attempt to highlight, using the terminology introduced in the previous sections, the common themes underlying these three diverse areas.


  1   2   3   4   5

Похожие:

Comparison of sequences, protein 3D structures and genomes iconProtein Peeling 2: a web server to convert protein structures into series of Protein Units

Comparison of sequences, protein 3D structures and genomes iconProtein-Protein Interaction and regulation

Comparison of sequences, protein 3D structures and genomes iconAssessing the gene space in draft genomes

Comparison of sequences, protein 3D structures and genomes iconProtein là hợp chất hữu cơ có ý nghĩa quan trọng bậc nhất trong cơ thể sống, về mặt số lượng, protein chiếm không dưới 50% trọng lượng khô của tế bào; về thành

Comparison of sequences, protein 3D structures and genomes iconPrediction of mt targeting sequences

Comparison of sequences, protein 3D structures and genomes iconEmergent Phylogenetic Signal from Neoavian dna sequences

Comparison of sequences, protein 3D structures and genomes iconThe motor vehicle action sequences depicted in this film are dangerous

Comparison of sequences, protein 3D structures and genomes iconGrammar: Degrees of Comparison

Comparison of sequences, protein 3D structures and genomes iconDp pairwise Comparison Algorithm

Comparison of sequences, protein 3D structures and genomes iconGrammar Revision The Degrees of Comparison

Разместите кнопку на своём сайте:
Библиотека


База данных защищена авторским правом ©lib.znate.ru 2014
обратиться к администрации
Библиотека
Главная страница