lib.sequence_alignment.needleman

Align two sequences using the Needleman-Wunsch algorithm using the EMBOSS logic for extensions.

This is implemented as described in the Wikipedia article on the Needleman-Wunsch algorithm. The algorithm has been modified to match that of EMBOSS to allow for gap opening and extension penalties, as well as end penalties.

Parameters:

sequence1 (str) - The first sequence.
sequence2 (str) - The second sequence.
sub_matrix (numpy rank-2 int array) - The substitution matrix to use to determine the penalties.
sub_seq (str) - The one letter code sequence corresponding to the substitution matrix indices.
gap_open_penalty (float) - The penalty for introducing gaps, as a positive number.
gap_extend_penalty (float) - The penalty for extending a gap, as a positive number.
end_gap_open_penalty (float) - The optional penalty for opening a gap at the end of a sequence.
end_gap_extend_penalty (float) - The optional penalty for extending a gap at the end of a sequence.

Returns: float, str, str, numpy rank-2 int array

The alignment score, two alignment strings and the gap matrix.

Construct the Needleman-Wunsch matrix for the given two sequences using the EMBOSS logic.

The algorithm has been modified to match that of EMBOSS to allow for gap opening and extension penalties, as well as end penalties.

Parameters:

sequence1 (str) - The first sequence.
sequence2 (str) - The second sequence.
sub_matrix (numpy rank-2 int16 array) - The substitution matrix to use to determine the penalties.
sub_seq (str) - The one letter code sequence corresponding to the substitution matrix indices.
gap_open_penalty (float) - The penalty for introducing gaps, as a positive number.
gap_extend_penalty (float) - The penalty for extending a gap, as a positive number.
end_gap_open_penalty (float) - The optional penalty for opening a gap at the end of a sequence.
end_gap_extend_penalty (float) - The optional penalty for extending a gap at the end of a sequence.
epsilon (float) - A value close to zero to determine if two numbers are the same, within this precision.

Returns: numpy rank-2 float32 array, numpy rank-2 int16 array

The Needleman-Wunsch matrix and traceback matrix.

Module needleman_wunsch