Multiple Protein Sequence Alignment

 

This is a valuable way to establish relationships and compare the relationships between multiple protein sequences.  One valuable application is phylogentics, where the degree of sequence homology between proteins can be used to determine the relationships between species.

 

For information pertaining to Clustal W version 1.8 refer to:

ClustalW 1.8 Online Help File at Baylor College of Medicine

 

Instructions:

 

1. The sequences to be test must be obtained and the form in which they are pasted into the box is important. Also, each sequence must be pasted into the box in the same format.

 

2. Click on the “Most ReadSeq formats accepted” hyperlink to check acceptable input formats. You will find for many of these types of applications that the FASTA format is the easiest to Copy and Paste with.

 

3. We suggest using the FASTA format that can be accessed via the following pathway, see Protein Sequence Retrieval directions for instructions on how to obtain FASTA sequence information.

 

4. Under “choose alignment method”, we suggest using “PIMA 1.4.” (however, each of the methods has beneficial aspects and is worth experimenting with).

 

5. Paste sequences into the box in the following manner. In the first line, type in a greater-than character (‘>’) followed by a comment or title for the sequence. Press enter to start a new line. This second line should contain the sequence.

 

6. Copy (Ctrl + C) your sequence in some acceptable format.

 

7. Paste (Ctrl + V) that sequence into the line following the greater-than character (‘>’). You may also type in the sequence by hand.

 

8. Repeat steps 5-7 until desired number of sequences is entered into the box. Each sequence must be pasted by this method.

 

9. Click on the “Submit” button to start the program.

 

Note: For each new sequence entry, you must type in a greater-than character (‘>’) to inform the program that the entry is a different sequence from the previous. The following sequence input was for FASTA format.

 

10. The results will be returned with sequences listed on top of each other after efforts have been made for alignment. The first sequence listed is the programs attempt to find the conserved sequence. If an amino acid is conserved throughout the sequence, it will be listed. However, the program also matches pays attention to amino acid classes, that are a result of side chain similarities. (see the Amino Acid Classes description below).

 

 

Amino Acid Classes                     Match score
 
                                                                  -2
                _______________ X __________________               0
               /          /           \             \
            _ f _        /       ______r _______     \             1
          /  /    \     /       /   /     \     \     \
         /  c      \   e       /   m       p     \   _ j __        2
        /  /  \     \ / \     /   / \     / \     \ /   \  \
       /  a    b     d   \   /   l   k   o   n     i     h  \      3
      /  / \  / \   /|\   \ /   / \ / \ / \  /\   / \   / \  \
     C   I V  L M  F W Y   H   N   D   E  Q  K R  S T   A G   P    5

 

 

If you need more assistance, please refer to the Help section of the program.

 

 

When you are ready, click here to start.

Baylor College of Medicine (BCM) Search Launcher: Multiple Alignments.

 

 

Back to Applications