Author: bugman Date: Mon Jan 26 20:25:19 2015 New Revision: 27320 URL: http://svn.gna.org/viewcvs/relax?rev=27320&view=rev Log: Merged revisions 27246,27248-27291,27293-27319 via svnmerge from svn+ssh://bugman@xxxxxxxxxxx/svn/relax/trunk ........ r27246 | bugman | 2015-01-20 15:44:29 +0100 (Tue, 20 Jan 2015) | 3 lines Fix for the Relax_disp.test_bug_23186_cluster_error_calc_dw system test on 32-bit and Python <= 2.5 systems. ........ r27248 | bugman | 2015-01-21 09:54:28 +0100 (Wed, 21 Jan 2015) | 6 lines Better error handling in the structure.align user function. If no common atoms can be found between the structures, a RelaxError is now raised for better user feedback. ........ r27249 | bugman | 2015-01-21 10:07:23 +0100 (Wed, 21 Jan 2015) | 6 lines Created an empty lib.sequence_alignment relax library package. This may be used in the future for implementing more advanced structural alignments (the current method is simply to skip missing atoms, sequence numbering changes are not handled). ........ r27250 | bugman | 2015-01-21 11:23:41 +0100 (Wed, 21 Jan 2015) | 3 lines Added the sequence_alignment package to the lib package __all__ list. ........ r27251 | bugman | 2015-01-21 11:25:26 +0100 (Wed, 21 Jan 2015) | 3 lines Added the unit testing infrastructure for the new lib.sequence_alignment package. ........ r27252 | bugman | 2015-01-21 11:37:37 +0100 (Wed, 21 Jan 2015) | 6 lines Implementation of the Needleman-Wunsch sequence alignment algorithm. This is located in the lib.sequence_alignment.needleman_wunsch module. This is implemented as described in the Wikipedia article https://en.wikipedia.org/wiki/Needleman%E2%80%93Wunsch_algorithm. ........ r27253 | bugman | 2015-01-21 11:39:24 +0100 (Wed, 21 Jan 2015) | 8 lines Created a unit test for checking the Needleman-Wunsch sequence alignment algorithm. This uses the DNA data from the example in the Wikipedia article at https://en.wikipedia.org/wiki/Needleman%E2%80%93Wunsch_algorithm. The test shows that the implementation of the lib.sequence_alignment.needleman_wunsch.needleman_wunsch_align() function is correct. ........ r27254 | bugman | 2015-01-21 12:15:53 +0100 (Wed, 21 Jan 2015) | 6 lines Created the lib.sequence_alignment.substitution_matrices module. This is for storing substitution matrices for use in sequence alignment. The module currently only includes the BLOSSUM62 matrix. ........ r27255 | bugman | 2015-01-21 12:21:53 +0100 (Wed, 21 Jan 2015) | 3 lines Corrected the spelling of the BLOSUM62 matrix in lib.sequence_alignment.substitution_matrices. ........ r27256 | bugman | 2015-01-21 14:03:24 +0100 (Wed, 21 Jan 2015) | 3 lines Fix for the lib.sequence_alignment.substitution_matrices.BLOSUM62_SEQ string. ........ r27257 | bugman | 2015-01-21 15:36:43 +0100 (Wed, 21 Jan 2015) | 7 lines Modification of the Needleman-Wunsch sequence alignment algorithm implementation. This is in the lib.sequence_alignment.needleman_wunsch functions. Scoring matrices are now supported, as well as a user supplied non-integer gap penalty. The algorithm for walking through the traceback matrix has been fixed for a bug under certain conditions. ........ r27258 | bugman | 2015-01-21 15:40:56 +0100 (Wed, 21 Jan 2015) | 9 lines Created the lib.sequence_alignment.align_protein module for the sequence alignment of proteins. This general module currently implements the align_pairwise() function for the pairwise alignment of protein sequences. It provides the infrastructure for specifying gap starting and extension penalties, choosing the alignment algorithm (currently only the Needleman-Wunsch sequence alignment algorithm as 'NW70'), and choosing the substitution matrix (currently only BLOSUM62). The function provides lots of printouts for user feedback. ........ r27259 | bugman | 2015-01-21 15:52:03 +0100 (Wed, 21 Jan 2015) | 6 lines Created a unit test for lib.sequence_alignment.align_protein.align_pairwise(). This is to test the pairwise alignment of two protein sequences using the Needleman-Wunsch sequence alignment algorithm, BLOSUM62 substitution matrix, and gap penalty of 10.0. ........ r27260 | bugman | 2015-01-21 15:58:43 +0100 (Wed, 21 Jan 2015) | 5 lines Added more printouts to the Test_align_protein.test_align_pairwise unit test. This is the test of the module _lib._sequence_alignment.test_align_protein. ........ r27261 | bugman | 2015-01-21 15:59:28 +0100 (Wed, 21 Jan 2015) | 3 lines Fix for the Needleman-Wunsch sequence alignment algorithm when the substitution matrix is absent. ........ r27262 | bugman | 2015-01-21 16:01:27 +0100 (Wed, 21 Jan 2015) | 5 lines The lib.sequence_alignment.align_protein.align_pairwise() function now returns data. This includes both alignment strings as well as the gap matrix. ........ r27263 | bugman | 2015-01-22 15:45:25 +0100 (Thu, 22 Jan 2015) | 3 lines Annotated the BLOSUM62 substitution matrix with the amino acid codes for easy reading. ........ r27264 | bugman | 2015-01-22 15:53:43 +0100 (Thu, 22 Jan 2015) | 5 lines Updated the gap penalties in the Test_align_protein.test_align_pairwise unit test. This is from the unit test module _lib._sequence_alignment.test_align_protein. ........ r27265 | bugman | 2015-01-22 15:57:15 +0100 (Thu, 22 Jan 2015) | 8 lines Modified the Needleman-Wunsch sequence alignment algorithm. The previous attempt was buggy. The algorithm has been modified to match the logic of the GPL licenced EMBOSS software (http://emboss.sourceforge.net/) to allow for gap opening and extension penalties, as well as end penalties. No code was copied, rather the algorithm for creating the scoring and penalty matrices, as well as the traceback matrix. ........ r27266 | bugman | 2015-01-22 16:01:39 +0100 (Thu, 22 Jan 2015) | 3 lines Added a DNA similarity matrix to lib.sequence_alignment.substitution_matrices. ........ r27267 | bugman | 2015-01-22 16:09:34 +0100 (Thu, 22 Jan 2015) | 6 lines Added sanity checks to the Needleman-Wunsch sequence alignment algorithm. The residues of both sequences are now checked in needleman_wunsch_align() to make sure that they are present in the substitution matrix. ........ r27268 | bugman | 2015-01-22 16:34:06 +0100 (Thu, 22 Jan 2015) | 5 lines Added the NUC 4.4 nucleotide substitution matrix from ftp://ftp.ncbi.nih.gov/blast/matrices/. Uracil was added to the table as a copy to T. ........ r27269 | bugman | 2015-01-22 16:35:49 +0100 (Thu, 22 Jan 2015) | 5 lines Added the header from ftp://ftp.ncbi.nih.gov/blast/matrices/BLOSUM62. This is to document the BLOSUM62 substitution matrix. ........ r27270 | bugman | 2015-01-22 16:39:34 +0100 (Thu, 22 Jan 2015) | 6 lines Added the PAM 250 amino acid substitution matrix. This was taken from ftp://ftp.ncbi.nih.gov/blast/matrices/PAM250 and added to lib.sequence_alignment.substitution_matrices.PAM250. ........ r27271 | bugman | 2015-01-22 16:55:21 +0100 (Thu, 22 Jan 2015) | 6 lines Modified the Test_needleman_wunsch.test_needleman_wunsch_align_DNA unit test to pass. This is from the unit test module _lib._sequence_alignment.test_needleman_wunsch. The DNA sequences were simplified so that the behaviour can be better predicted. ........ r27272 | bugman | 2015-01-22 16:56:35 +0100 (Thu, 22 Jan 2015) | 6 lines Created the Test_needleman_wunsch.test_needleman_wunsch_align_NUC_4_4 unit test. This is in the unit test module _lib._sequence_alignment.test_needleman_wunsch. This tests the Needleman-Wunsch sequence alignment for two DNA sequences using the NUC 4.4 matrix. ........ r27273 | bugman | 2015-01-22 17:00:03 +0100 (Thu, 22 Jan 2015) | 7 lines Created a unit test for demonstrating a failure in the Needleman-Wunsch sequence alignment algorithm. The test is Test_needleman_wunsch.test_needleman_wunsch_align_NUC_4_4b from the _lib._sequence_alignment.test_needleman_wunsch module. The problem is that the start of the alignment is truncated if any gaps are present. ........ r27274 | bugman | 2015-01-22 17:07:58 +0100 (Thu, 22 Jan 2015) | 5 lines Fix for the Needleman-Wunsch sequence alignment algorithm. The start of the sequences are no longer truncated when starting gaps are encountered. ........ r27275 | bugman | 2015-01-22 17:20:07 +0100 (Thu, 22 Jan 2015) | 5 lines The needleman_wunsch_align() function now accepts the end gap penalty arguments. These are passed onto the needleman_wunsch_matrix() function. ........ r27276 | bugman | 2015-01-22 17:21:34 +0100 (Thu, 22 Jan 2015) | 3 lines Added the end gap penalty arguments to lib.sequence_alignment.align_protein.align_pairwise(). ........ r27277 | bugman | 2015-01-22 17:30:20 +0100 (Thu, 22 Jan 2015) | 9 lines Created the Structure.test_align_CaM_BLOSUM62 system test. This will be used for expanding the functionality of the structure.align user function to perform true sequence alignment via the new lib.sequence_alignment package. The test aligns 3 calmodulin (CaM) structures from different organisms, hence the sequence numbering is different and the current structure.align user function design fails. The structure.align user function has been expanded in the test to include a number of arguments for advanced sequence alignment. ........ r27278 | bugman | 2015-01-22 18:01:40 +0100 (Thu, 22 Jan 2015) | 5 lines Added support for the PAM250 substitution matrix to the protein pairwise sequence alignment function. This is the function lib.sequence_alignment.align_protein.align_pairwise(). ........ r27279 | bugman | 2015-01-22 19:54:17 +0100 (Thu, 22 Jan 2015) | 6 lines Bug fix for the Needleman-Wunsch sequence alignment algorithm. Part of the scoring system was functioning incorrectly when the gap penalty scores were non-integer, as some scores were being stored in an integer array. Now the array is a float array. ........ r27280 | bugman | 2015-01-22 19:55:49 +0100 (Thu, 22 Jan 2015) | 7 lines Created the Test_align_protein.test_align_pairwise_PAM250 unit test. This is in the unit test module _lib._sequence_alignment.test_align_protein. It check the protein alignment function lib.sequence_alignment.align_protein.align_pairwise() together with the PAM250 substitution matrix. ........ r27281 | bugman | 2015-01-23 09:38:45 +0100 (Fri, 23 Jan 2015) | 3 lines Small docstring expansion for lib.sequence_alignment.align_protein.align_pairwise(). ........ r27282 | bugman | 2015-01-23 09:40:55 +0100 (Fri, 23 Jan 2015) | 8 lines Added the sequence alignment arguments to the structure.align user function front end. This includes the 'matrix', 'gap_open_penalty', 'gap_extend_penalty', 'end_gap_open_penalty', and 'end_gap_extend_penalty' arguments. The 'algorithm' argument has not been added to save room, as there is only one choice of 'NW70'. A paragraph has been added to the user function description to explain the sequence alignment part of the user function. ........ r27283 | bugman | 2015-01-23 09:42:22 +0100 (Fri, 23 Jan 2015) | 6 lines Added the sequence alignment arguments to the back end of the structure.align user function. This is to allow the code in trunk to be functional before the sequence alignment before superimposition has been implemented. ........ r27284 | bugman | 2015-01-23 09:46:40 +0100 (Fri, 23 Jan 2015) | 6 lines Removed the 'algorithm' argument from the Structure.test_align_CaM_BLOSUM62 system test script. This is for the structure.align user function. The argument has not been implemented to save room in the GUI, and as 'NW70' is currently the only choice. ........ r27285 | bugman | 2015-01-23 10:05:12 +0100 (Fri, 23 Jan 2015) | 5 lines The sequence alignment arguments are now passed all the way to the internal structural object backend. These are the arguments of the structure.align user function. ........ r27286 | bugman | 2015-01-23 10:45:59 +0100 (Fri, 23 Jan 2015) | 3 lines Copyright notice updates to 2015. ........ r27287 | bugman | 2015-01-23 11:02:05 +0100 (Fri, 23 Jan 2015) | 7 lines Created the lib.sequence.aa_codes_three_to_one() function. The lib.sequence module now contains the AA_CODES dictionary which is a translation table for the 3 letter amino acid codes to the one letter codes. The new aa_codes_three_to_one() function performs the conversion. ........ r27288 | bugman | 2015-01-23 11:03:35 +0100 (Fri, 23 Jan 2015) | 5 lines Implemented the internal structural object MolContainer.loop_residues() method. This generator method is used to quickly loop over all residues of the molecule. ........ r27289 | bugman | 2015-01-23 11:06:53 +0100 (Fri, 23 Jan 2015) | 7 lines Implemented the internal structural object one_letter_codes() method. This will create a string of one letter residue codes for the given molecule. Only proteins are currently supported. This method uses the new lib.sequence.aa_codes_three_to_one() relax library function. ........ r27290 | bugman | 2015-01-23 11:09:41 +0100 (Fri, 23 Jan 2015) | 7 lines Sequence alignment is now performed in lib.structure.internal.coordinates.assemble_coord_array(). This is a pairwise alignment to the first molecule of the list. The alignments are not yet used for anything. The assemble_coord_array() function is used by the structure.align user function, as well as a few other structure user functions. ........ r27291 | bugman | 2015-01-23 15:38:21 +0100 (Fri, 23 Jan 2015) | 7 lines Fix for the lib.sequence.aa_codes_three_to_one() function. Non-standard residues are now converted to the '*' code. The value of 'X' prevents any type of alignment of a stretch of X residues as X to X in both the BLOSUM62 and PAM250 substitution matrices are set to -1. ........ r27293 | bugman | 2015-01-23 17:49:29 +0100 (Fri, 23 Jan 2015) | 6 lines Modified the gap penalty arguments for the structure.align user function. These now must always be supplied, as None is not handled by the backend lib.sequence_alignment.needleman_wunsch module. The previous defaults of None are now set to 0.0. ........ r27294 | bugman | 2015-01-26 10:47:26 +0100 (Mon, 26 Jan 2015) | 7 lines Updated the artificial diffusion tensor test suite data. This is the data in test_suite/shared_data/diffusion_tensor. The residues in the PDB files are now proper amino acids, so the HETATM records are now ATOM records, and the CONECT records have been eliminated. ........ r27295 | bugman | 2015-01-26 10:50:14 +0100 (Mon, 26 Jan 2015) | 6 lines Another update for the artificial diffusion tensor test suite data. The number of increments on the sphere has been increased from 5 to 6, to make the vector distribution truly uniform. All PDB files and relaxation data has been updated. ........ r27296 | bugman | 2015-01-26 11:06:30 +0100 (Mon, 26 Jan 2015) | 7 lines Bug fix for the printouts from the relax_data.read user function. This problem was introduced in the last relax release (at r26588). The problem is that the spin ID in the loaded relaxation data printout is the same for all data, being the spin ID of the first spin. This has no effect on how relax runs, it is only incorrect feedback. ........ r27297 | bugman | 2015-01-26 11:26:15 +0100 (Mon, 26 Jan 2015) | 7 lines Changed the synthetic PDB for the artificial diffusion tensor test suite data. The nitrogen and proton positions are now shifted 10 Angstrom along the distribution vectors. This is to avoid having all nitrogens positioned at the origin which causes the internal structural object algorithm for determining which atoms are connected to fail. ........ r27298 | bugman | 2015-01-26 11:29:38 +0100 (Mon, 26 Jan 2015) | 7 lines Reintroduced the CONECT PDB records into the artificial diffusion tensor test suite data. The uniform vector distributions have overlapping vectors. This causes the internal structural object atom connection determining algorithm to fail, as this is distance-based rather than using the PDB amino acid definitions for now. ........ r27299 | bugman | 2015-01-26 11:45:40 +0100 (Mon, 26 Jan 2015) | 7 lines Bug fix for the structure.read_pdb user function parsing of CONECT records. CONECT records pointing to ATOM records were not being read by the user function. As ATOM records should not require CONECT records by their definition, this is only a minor problem affecting synthetic edge cases. ........ r27300 | bugman | 2015-01-26 14:25:42 +0100 (Mon, 26 Jan 2015) | 6 lines Updates for the Structure.test_create_diff_tensor_pdb_sphere system test. The test now uses the sphere synthetic relaxation data rather than the ellipsoid data, and the PDB checking has been updated for the new data. ........ r27301 | bugman | 2015-01-26 14:33:32 +0100 (Mon, 26 Jan 2015) | 6 lines Updates for the Structure.test_create_diff_tensor_pdb_prolate system test. The test now uses the spheroid synthetic relaxation data rather than the ellipsoid data, and the PDB checking has been updated for the new data. ........ r27302 | bugman | 2015-01-26 14:42:15 +0100 (Mon, 26 Jan 2015) | 7 lines Updates for the Structure.test_create_diff_tensor_pdb_oblate system test. The test now uses the spheroid synthetic relaxation data rather than the ellipsoid data, and the PDB checking has been updated for the new data. The oblate tensor is now forced in the system test script. ........ r27303 | bugman | 2015-01-26 14:48:45 +0100 (Mon, 26 Jan 2015) | 5 lines Updates for the Structure.test_create_diff_tensor_pdb_ellipsoid system test. The PDB checking has been updated for the new data. ........ r27304 | bugman | 2015-01-26 14:58:55 +0100 (Mon, 26 Jan 2015) | 6 lines Updated the Structure.test_delete_atom system test for the changed PDB structures. The test_suite/shared_data/diffusion_tensor/spheroid/uniform.pdb file now has more residues, and the atomic positions are different. ........ r27305 | bugman | 2015-01-26 15:03:00 +0100 (Mon, 26 Jan 2015) | 6 lines Updated the Structure.test_align system testt for the changed PDB structures. The test_suite/shared_data/diffusion_tensor/spheroid/uniform.pdb file now has more residues, and the atomic positions are different. ........ r27306 | bugman | 2015-01-26 15:04:12 +0100 (Mon, 26 Jan 2015) | 6 lines Updated the Structure.test_align_molecules system test for the changed PDB structures. The test_suite/shared_data/diffusion_tensor/spheroid/uniform.pdb file now has more residues, and the atomic positions are different. ........ r27307 | bugman | 2015-01-26 15:19:18 +0100 (Mon, 26 Jan 2015) | 5 lines Python 3 fix for the lib.sequence module. The string.upper() function no longer exists. ........ r27308 | bugman | 2015-01-26 15:20:09 +0100 (Mon, 26 Jan 2015) | 5 lines Python 3 fix for the lib.sequence_alignment.align_protein module. The string.upper() function no longer exists. ........ r27309 | bugman | 2015-01-26 15:22:12 +0100 (Mon, 26 Jan 2015) | 5 lines Modified the generate_data.py diffusion tensor to relaxation data creation script. The NH vectors are no longer truncated to match the PDB. ........ r27310 | bugman | 2015-01-26 15:22:43 +0100 (Mon, 26 Jan 2015) | 5 lines Python 3 fix for the generate_data.py diffusion tensor to relaxation data creation script. The string.upper() function no longer exists. ........ r27311 | bugman | 2015-01-26 16:10:14 +0100 (Mon, 26 Jan 2015) | 6 lines [... 311 lines stripped ...]