structure.sequence

structure.sequence_alignment

Synopsis

Multiple sequence alignment (MSA) of structural data.

structure.sequence_alignment(pipes=None, models=None, molecules=None, msa_algorithm=`Central Star', pairwise_algorithm=None, matrix=None, gap_open_penalty=10.0, gap_extend_penalty=1.0, end_gap_open_penalty=0.0, end_gap_extend_penalty=0.0)

Keyword arguments

pipes: The data pipes to use in the sequence alignment.

models: The list of models for each data pipe to use in the sequence alignment. The number of elements must match the pipes argument. If no models are given, then all will be used.

molecules: The list of molecules for each data pipe to use in the sequence alignment. This allows differently named molecules in the same or different data pipes to be superimposed. The number of elements must match the pipes argument. If no molecules are given, then all will be used.

msa_algorithm: The multiple sequence alignment (MSA) algorithm used to align all the primary sequence of all structures of interest.

pairwise_algorithm: The pairwise alignment algorithm to align each pair of sequences.

matrix: The substitution matrix to use in the pairwise sequence alignment algorithm.

gap_open_penalty: The penalty for introducing gaps, as a positive number.

gap_extend_penalty: The penalty for extending a gap, as a positive number.

end_gap_open_penalty: The optional penalty for opening a gap at the end of a sequence.

end_gap_extend_penalty: The optional penalty for extending a gap at the end of a sequence.

Description

To find the atoms in common between different molecules, a MSA of the primary sequence of the molecules is required. This sequence alignment will then subsequently be used by any other user function which operates on multiple molecules. The following MSA algorithms can be selected:

`Central Star' -: This is a heuristic, progressive alignment method using pairwise alignments to construct a MSA. It consists of four major steps - pairwise alignment between all sequence pairs, finding the central sequence, iteratively aligning the sequences to the gapped central sequence, and introducing gaps in previous alignments during the iterative alignment.
`residue number' -: This will simply align the molecules based on residue number.

For the MSA algorithms which require pairwise alignments, the following subalgorithms can be used:

`NW70' -: The Needleman-Wunsch alignment algorithm. This has been modified to use the logic of the EMBOSS software for handling gap opening and extension penalties, as well as end penalties.

For the MSAs or pairwise alignments which require a substitution matrix, one of the following can be used:

`BLOSUM62' -: The BLOcks SUbstitution Matrix for proteins with a cluster percentage ≥ 62%.
`PAM250' -: The point accepted mutation matrix for proteins with n = 250 evolutionary distance.
`NUC 4.4' -: The nucleotide 4.4 matrix for DNA/RNA.

Support for multiple structures is provided by the data pipes, model numbers and molecule names arguments. Each data pipe, model and molecule combination will be treated as a separate structure. As only atomic coordinates with the same residue name and number and atom name will be assembled, structures with slightly different atomic structures can be compared. If the list of models is not supplied, then all models of all data pipes will be used. If the optional molecules list is supplied, each molecule in the list will be considered as a separate structure for comparison between each other.

Prompt examples

To superimpose the structures in the `A' data pipe onto the structures of the `B' data pipe using backbone heavy atoms, type:

[numbers=none]
relax> structure.sequence_alignment(pipes=['B', 'A'], atom_id='@N,C,CA,O')

The relax user manual (PDF), created 2024-06-08.