lib.structure.pdb

1 ############################################################################### 2 # # 3 # Copyright (C) 2013,2015 Edward d'Auvergne # 4 # # 5 # This file is part of the program relax (http://www.nmr-relax.com). # 6 # # 7 # This program is free software: you can redistribute it and/or modify # 8 # it under the terms of the GNU General Public License as published by # 9 # the Free Software Foundation, either version 3 of the License, or # 10 # (at your option) any later version. # 11 # # 12 # This program is distributed in the hope that it will be useful, # 13 # but WITHOUT ANY WARRANTY; without even the implied warranty of # 14 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # 15 # GNU General Public License for more details. # 16 # # 17 # You should have received a copy of the GNU General Public License # 18 # along with this program. If not, see <http://www.gnu.org/licenses/>. # 19 # # 20 ############################################################################### 21 22 # Module docstring. 23 """Module for parsing PDB records. 24 25 This module currently used the PDB format version 3.30 from July, 2011 U{http://www.wwpdb.org/documentation/file-format/format33/v3.3.html}. 26 """ 27 28 # relax module imports. 29 from lib.errors import RelaxImplementError 30 31

32 -def atom(record):

33 """Parse the ATOM record. 34 35 The following is the PDB v3.3 documentation U{http://www.wwpdb.org/documentation/file-format/format33/sect9.html#ATOM}. 36 37 ATOM 38 ==== 39 40 Overview 41 -------- 42 43 The ATOM records present the atomic coordinates for standard amino acids and nucleotides. They also present the occupancy and temperature factor for each atom. Non-polymer chemical coordinates use the HETATM record type. The element symbol is always present on each ATOM record; charge is optional. 44 45 Changes in ATOM/HETATM records result from the standardization atom and residue nomenclature. This nomenclature is described in the Chemical Component Dictionary (U{ftp://ftp.wwpdb.org/pub/pdb/data/monomers}). 46 47 48 Record Format 49 ------------- 50 51 The format is:: 52 __________________________________________________________________________________________ 53 | | | | | 54 | Columns | Data type | Field | Definition | 55 |_________|______________|______________|________________________________________________| 56 | | | | | 57 | 1 - 6 | Record name | "ATOM" | | 58 | 7 - 11 | Integer | serial | Atom serial number. | 59 | 13 - 16 | Atom | name | Atom name. | 60 | 17 | Character | altLoc | Alternate location indicator. | 61 | 18 - 20 | Residue name | resName | Residue name. | 62 | 22 | Character | chainID | Chain identifier. | 63 | 23 - 26 | Integer | resSeq | Residue sequence number. | 64 | 27 | AChar | iCode | Code for insertion of residues. | 65 | 31 - 38 | Real(8.3) | x | Orthogonal coordinates for X in Angstroms. | 66 | 39 - 46 | Real(8.3) | y | Orthogonal coordinates for Y in Angstroms. | 67 | 47 - 54 | Real(8.3) | z | Orthogonal coordinates for Z in Angstroms. | 68 | 55 - 60 | Real(6.2) | occupancy | Occupancy. | 69 | 61 - 66 | Real(6.2) | tempFactor | Temperature factor. | 70 | 77 - 78 | LString(2) | element | Element symbol, right-justified. | 71 | 79 - 80 | LString(2) | charge | Charge on the atom. | 72 |_________|______________|______________|________________________________________________| 73 74 75 Details 76 ------- 77 78 ATOM records for proteins are listed from amino to carboxyl terminus. 79 80 Nucleic acid residues are listed from the 5' to the 3' terminus. 81 82 Alignment of one-letter atom name such as C starts at column 14, while two-letter atom name such as FE starts at column 13. 83 84 Atom nomenclature begins with atom type. 85 86 No ordering is specified for polysaccharides. 87 88 Non-blank alphanumerical character is used for chain identifier. 89 90 The list of ATOM records in a chain is terminated by a TER record. 91 92 If more than one model is present in the entry, each model is delimited by MODEL and ENDMDL records. 93 94 AltLoc is the place holder to indicate alternate conformation. The alternate conformation can be in the entire polymer chain, or several residues or partial residue (several atoms within one residue). If an atom is provided in more than one position, then a non-blank alternate location indicator must be used for each of the atomic positions. Within a residue, all atoms that are associated with each other in a given conformation are assigned the same alternate position indicator. There are two ways of representing alternate conformation- either at atom level or at residue level (see examples). 95 96 For atoms that are in alternate sites indicated by the alternate site indicator, sorting of atoms in the ATOM/HETATM list uses the following general rules: 97 98 - In the simple case that involves a few atoms or a few residues with alternate sites, the coordinates occur one after the other in the entry. 99 - In the case of a large heterogen groups which are disordered, the atoms for each conformer are listed together. 100 101 Alphabet letters are commonly used for insertion code. The insertion code is used when two residues have the same numbering. The combination of residue numbering and insertion code defines the unique residue. 102 103 If the depositor provides the data, then the isotropic B value is given for the temperature factor. 104 105 If there are neither isotropic B values from the depositor, nor anisotropic temperature factors in ANISOU, then the default value of 0.0 is used for the temperature factor. 106 107 Columns 79 - 80 indicate any charge on the atom, e.g., 2+, 1-. In most cases, these are blank. 108 109 For refinements with program REFMAC prior 5.5.0042 which use TLS refinement, the values of B may include only the TLS contribution to the isotropic temperature factor rather than the full isotropic value. 110 111 112 Verification/Validation/Value Authority Control 113 ----------------------------------------------- 114 115 The ATOM/HETATM records are checked for PDB file format, sequence information, and packing. 116 117 118 Relationships to Other Record Types 119 ----------------------------------- 120 121 The ATOM records are compared to the corresponding sequence database. Sequence discrepancies appear in the SEQADV record. Missing atoms are annotated in the remarks. HETATM records are formatted in the same way as ATOM records. The sequence implied by ATOM records must be identical to that given in SEQRES, with the exception that residues that have no coordinates, e.g., due to disorder, must appear in SEQRES. 122 123 124 Example 125 ------- 126 127 Example 1:: 128 129 1 2 3 4 5 6 7 8 130 12345678901234567890123456789012345678901234567890123456789012345678901234567890 131 ATOM 32 N AARG A -3 11.281 86.699 94.383 0.50 35.88 N 132 ATOM 33 N BARG A -3 11.296 86.721 94.521 0.50 35.60 N 133 ATOM 34 CA AARG A -3 12.353 85.696 94.456 0.50 36.67 C 134 ATOM 35 CA BARG A -3 12.333 85.862 95.041 0.50 36.42 C 135 ATOM 36 C AARG A -3 13.559 86.257 95.222 0.50 37.37 C 136 ATOM 37 C BARG A -3 12.759 86.530 96.365 0.50 36.39 C 137 ATOM 38 O AARG A -3 13.753 87.471 95.270 0.50 37.74 O 138 ATOM 39 O BARG A -3 12.924 87.757 96.420 0.50 37.26 O 139 ATOM 40 CB AARG A -3 12.774 85.306 93.039 0.50 37.25 C 140 ATOM 41 CB BARG A -3 13.428 85.746 93.980 0.50 36.60 C 141 ATOM 42 CG AARG A -3 11.754 84.432 92.321 0.50 38.44 C 142 ATOM 43 CG BARG A -3 12.866 85.172 92.651 0.50 37.31 C 143 ATOM 44 CD AARG A -3 11.698 84.678 90.815 0.50 38.51 C 144 ATOM 45 CD BARG A -3 13.374 85.886 91.406 0.50 37.66 C 145 ATOM 46 NE AARG A -3 12.984 84.447 90.163 0.50 39.94 N 146 ATOM 47 NE BARG A -3 12.644 85.487 90.195 0.50 38.24 N 147 ATOM 48 CZ AARG A -3 13.202 84.534 88.850 0.50 40.03 C 148 ATOM 49 CZ BARG A -3 13.114 85.582 88.947 0.50 39.55 C 149 ATOM 50 NH1AARG A -3 12.218 84.840 88.007 0.50 40.76 N 150 ATOM 51 NH1BARG A -3 14.338 86.056 88.706 0.50 40.23 N 151 ATOM 52 NH2AARG A -3 14.421 84.308 88.373 0.50 40.45 N 152 153 Example 2:: 154 155 1 2 3 4 5 6 7 8 156 12345678901234567890123456789012345678901234567890123456789012345678901234567890 157 ATOM 32 N AARG A -3 11.281 86.699 94.383 0.50 35.88 N 158 ATOM 33 CA AARG A -3 12.353 85.696 94.456 0.50 36.67 C 159 ATOM 34 C AARG A -3 13.559 86.257 95.222 0.50 37.37 C 160 ATOM 35 O AARG A -3 13.753 87.471 95.270 0.50 37.74 O 161 ATOM 36 CB AARG A -3 12.774 85.306 93.039 0.50 37.25 C 162 ATOM 37 CG AARG A -3 11.754 84.432 92.321 0.50 38.44 C 163 ATOM 38 CD AARG A -3 11.698 84.678 90.815 0.50 38.51 C 164 ATOM 39 NE AARG A -3 12.984 84.447 90.163 0.50 39.94 N 165 ATOM 40 CZ AARG A -3 13.202 84.534 88.850 0.50 40.03 C 166 ATOM 41 NH1AARG A -3 12.218 84.840 88.007 0.50 40.76 N 167 ATOM 42 NH2AARG A -3 14.421 84.308 88.373 0.50 40.45 N 168 ATOM 43 N BARG A -3 11.296 86.721 94.521 0.50 35.60 N 169 ATOM 44 CA BARG A -3 12.333 85.862 95.041 0.50 36.42 C 170 ATOM 45 C BARG A -3 12.759 86.530 96.365 0.50 36.39 C 171 ATOM 46 O BARG A -3 12.924 87.757 96.420 0.50 37.26 O 172 ATOM 47 CB BARG A -3 13.428 85.746 93.980 0.50 36.60 C 173 ATOM 48 CG BARG A -3 12.866 85.172 92.651 0.50 37.31 C 174 ATOM 49 CD BARG A -3 13.374 85.886 91.406 0.50 37.66 C 175 ATOM 50 NE BARG A -3 12.644 85.487 90.195 0.50 38.24 N 176 ATOM 51 CZ BARG A -3 13.114 85.582 88.947 0.50 39.55 C 177 ATOM 52 NH1BARG A -3 14.338 86.056 88.706 0.50 40.23 N 178 179 180 @param record: The PDB ATOM record. 181 @type record: str 182 @return: The atom serial number, atom name, alternate location indicator, residue name, chain identifier, sequence number, insertion code, orthogonal coordinates for X in Angstroms, orthogonal coordinates for Y in Angstroms, orthogonal coordinates for Z in Angstroms, occupancy, temperature factor, element symbol, charge on the atom. 183 @rtype: tuple of int, str, str, str, str, int, str, float, float, float, float, float, str, int 184 """ 185 186 # Initialise. 187 fields = [] 188 189 # Split up the record. 190 fields.append(record[0:6]) 191 fields.append(record[6:11]) 192 fields.append(record[12:16]) 193 fields.append(record[16]) 194 fields.append(record[17:20]) 195 fields.append(record[21]) 196 fields.append(record[22:26]) 197 fields.append(record[26]) 198 fields.append(record[30:38]) 199 fields.append(record[38:46]) 200 fields.append(record[46:54]) 201 fields.append(record[54:60]) 202 fields.append(record[60:66]) 203 fields.append(record[76:78]) 204 fields.append(record[78:80]) 205 206 # Loop over the fields. 207 for i in range(len(fields)): 208 # Strip all whitespace. 209 fields[i] = fields[i].strip() 210 211 # Replace nothingness with None. 212 if fields[i] == '': 213 fields[i] = None 214 215 # Convert strings to numbers. 216 if fields[1]: 217 fields[1] = int(fields[1]) 218 if fields[6]: 219 fields[6] = int(fields[6]) 220 if fields[8]: 221 fields[8] = float(fields[8]) 222 if fields[9]: 223 fields[9] = float(fields[9]) 224 if fields[10]: 225 fields[10] = float(fields[10]) 226 if fields[11]: 227 fields[11] = float(fields[11]) 228 if fields[12]: 229 fields[12] = float(fields[12]) 230 231 # Return the data. 232 return tuple(fields)

233 234

235 -def conect(record):

236 """Parse the CONECT record. 237 238 The following is the PDB v3.3 documentation U{http://www.wwpdb.org/documentation/file-format/format33/sect10.html#CONECT}. 239 240 CONECT 241 ====== 242 243 Overview 244 -------- 245 246 The CONECT records specify connectivity between atoms for which coordinates are supplied. The connectivity is described using the atom serial number as shown in the entry. CONECT records are mandatory for HET groups (excluding water) and for other bonds not specified in the standard residue connectivity table. These records are generated automatically. 247 248 Record Format 249 ------------- 250 251 The format is:: 252 ______________________________________________________________________________________________ 253 | | | | | 254 | Columns | Data type | Field | Definition | 255 |_________|______________|______________|____________________________________________________| 256 | | | | | 257 | 1 - 6 | Record name | "CONECT" | | 258 | 7 - 11 | Integer | serial | Atom serial number | 259 | 12 - 16 | Integer | serial | Serial number of bonded atom | 260 | 17 - 21 | Integer | serial | Serial number of bonded atom | 261 | 22 - 26 | Integer | serial | Serial number of bonded atom | 262 | 27 - 31 | Integer | serial | Serial number of bonded atom | 263 |_________|______________|______________|____________________________________________________| 264 265 266 Details 267 ------- 268 269 CONECT records are present for: 270 271 - Intra-residue connectivity within non-standard (HET) residues (excluding water). 272 - Inter-residue connectivity of HET groups to standard groups (including water) or to other HET groups. 273 - Disulfide bridges specified in the SSBOND records have corresponding records. 274 275 No differentiation is made between atoms with delocalized charges (excess negative or positive charge). 276 277 Atoms specified in the CONECT records have the same numbers as given in the coordinate section. 278 279 All atoms connected to the atom with serial number in columns 7 - 11 are listed in the remaining fields of the record. 280 281 If more than four fields are required for non-hydrogen and non-salt bridges, a second CONECT record with the same atom serial number in columns 7 - 11 will be used. 282 283 These CONECT records occur in increasing order of the atom serial numbers they carry in columns 7 - 11. The target-atom serial numbers carried on these records also occur in increasing order. 284 285 The connectivity list given here is redundant in that each bond indicated is given twice, once with each of the two atoms involved specified in columns 7 - 11. 286 287 For hydrogen bonds, when the hydrogen atom is present in the coordinates, a CONECT record between the hydrogen atom and its acceptor atom is generated. 288 289 For NMR entries, CONECT records for one model are generated describing heterogen connectivity and others for LINK records assuming that all models are homogeneous models. 290 291 292 Verification/Validation/Value Authority Control 293 ----------------------------------------------- 294 295 Connectivity is checked for unusual bond lengths. 296 297 298 Relationships to Other Record Types 299 ----------------------------------- 300 301 CONECT records must be present in an entry that contains either non-standard groups or disulfide bonds. 302 303 304 Example 305 ------- 306 307 Example 1:: 308 309 1 2 3 4 5 6 7 8 310 12345678901234567890123456789012345678901234567890123456789012345678901234567890 311 CONECT 1179 746 1184 1195 1203 312 CONECT 1179 1211 1222 313 CONECT 1021 544 1017 1020 1022 314 315 316 Known Problems 317 -------------- 318 319 CONECT records involving atoms for which the coordinates are not present in the entry (e.g., symmetry-generated) are not given. 320 321 CONECT records involving atoms for which the coordinates are missing due to disorder, are also not provided. 322 323 324 @param record: The PDB CONECT record. 325 @type record: str 326 @return: The atom serial number, serial number of the bonded atom 1, serial number of the bonded atom 2, serial number of the bonded atom 3, serial number of the bonded atom 4. 327 @rtype: tuple of int, int, int, int, int 328 """ 329 330 # Initialise. 331 fields = [] 332 333 # Split up the record. 334 fields.append(record[0:6]) 335 fields.append(record[6:11]) 336 fields.append(record[11:16]) 337 fields.append(record[16:21]) 338 fields.append(record[21:26]) 339 fields.append(record[26:31]) 340 341 # Loop over the fields. 342 for i in range(len(fields)): 343 # Strip all whitespace. 344 fields[i] = fields[i].strip() 345 346 # Replace nothingness with None. 347 if fields[i] == '': 348 fields[i] = None 349 350 # Convert strings to numbers. 351 if fields[1]: 352 fields[1] = int(fields[1]) 353 if fields[2]: 354 fields[2] = int(fields[2]) 355 if fields[3]: 356 fields[3] = int(fields[3]) 357 if fields[4]: 358 fields[4] = int(fields[4]) 359 if fields[5]: 360 fields[5] = int(fields[5]) 361 362 # Return the data. 363 return tuple(fields)

364 365

366 -def formul(record):

367 """Parse the FORMUL record. 368 369 The following is the PDB v3.3 documentation U{http://www.wwpdb.org/documentation/file-format/format33/sect4.html#FORMUL}. 370 371 FORMUL 372 ====== 373 374 Overview 375 -------- 376 377 The FORMUL record presents the chemical formula and charge of a non-standard group. 378 379 380 Record Format 381 ------------- 382 383 The format is:: 384 ______________________________________________________________________________________________ 385 | | | | | 386 | Columns | Data type | Field | Definition | 387 |_________|______________|______________|____________________________________________________| 388 | | | | | 389 | 1 - 6 | Record name | "FORMUL" | | 390 | 9 - 10 | Integer | compNum | Component number. | 391 | 13 - 15 | LString(3) | hetID | Het identifier. | 392 | 17 - 18 | Integer | continuation | Continuation number. | 393 | 19 | Character | asterisk | "*" for water. | 394 | 20 - 70 | String | text | Chemical formula. | 395 |_________|______________|______________|____________________________________________________| 396 397 398 Details 399 ------- 400 401 The elements of the chemical formula are given in the order following Hill ordering. The order of elements depends on whether carbon is present or not. If carbon is present, the order should be: C, then H, then the other elements in alphabetical order of their symbol. If carbon is not present, the elements are listed purely in alphabetic order of their symbol. This is the 'Hill' system used by Chemical Abstracts. 402 403 The number of each atom type present immediately follows its chemical symbol without an intervening blank space. There will be no number indicated if there is only one atom for a particular atom type. 404 405 Each set of SEQRES records and each HET group is assigned a component number in an entry. These numbers are assigned serially, beginning with 1 for the first set of SEQRES records. In addition: 406 407 - If a HET group is presented on a SEQRES record its FORMUL is assigned the component number of the chain in which it appears. 408 - If the HET group occurs more than once and is not presented on SEQRES records, the component number of its first occurrence is used. 409 410 All occurrences of the HET group within a chain are grouped together with a multiplier. The remaining occurrences are also grouped with a multiplier. The sum of the multipliers is the number equaling the number of times that that HET group appears in the entry. 411 412 A continuation field is provided in the event that more space is needed for the formula. Columns 17 - 18 are used in order to maintain continuity with the existing format. 413 414 415 Verification/Validation/Value Authority Control 416 ----------------------------------------------- 417 418 For each het group that appears in the entry, the corresponding HET, HETNAM, FORMUL, HETATM, and CONECT records must appear. The FORMUL record is generated automatically by PDB processing programs using the het group template file and information from HETATM records. UNL, UNK and UNX will not be listed in FORMUL even though these het groups present in the coordinate section. 419 420 421 Relationships to Other Record Types 422 ----------------------------------- 423 424 For each het group that appears in the entry, the corresponding HET, HETNAM, FORMUL, HETATM, and CONECT records must appear. 425 426 427 Example 428 ------- 429 430 Example 1:: 431 432 1 2 3 4 5 6 7 8 433 12345678901234567890123456789012345678901234567890123456789012345678901234567890 434 FORMUL 3 MG 2(MG 2+) 435 FORMUL 5 SO4 6(O4 S 2-) 436 FORMUL 13 HOH *360(H2 O) 437 438 FORMUL 3 NAP 2(C21 H28 N7 O17 P3) 439 FORMUL 4 FOL 2(C19 H19 N7 O6) 440 FORMUL 5 1PE C10 H22 O6 441 442 FORMUL 2 NX5 C14 H10 O2 CL2 S 443 444 445 @param record: The PDB FORMUL record. 446 @type record: str 447 @raises RelaxImplementError: Always. 448 """ 449 450 # Not implemented yet. 451 raise RelaxImplementError('formul')

452 453

454 -def helix(record):

455 """Parse the HELIX record. 456 457 The following is the PDB v3.3 documentation U{http://www.wwpdb.org/documentation/file-format/format33/sect5.html#HELIX}. 458 459 HELIX 460 ===== 461 462 Overview 463 -------- 464 465 HELIX records are used to identify the position of helices in the molecule. Helices are named, numbered, and classified by type. The residues where the helix begins and ends are noted, as well as the total length. 466 467 468 Record Format 469 ------------- 470 471 The format is:: 472 ______________________________________________________________________________________________ 473 | | | | | 474 | Columns | Data type | Field | Definition | 475 |_________|______________|______________|____________________________________________________| 476 | | | | | 477 | 1 - 6 | Record name | "HELIX " | | 478 | 8 - 10 | Integer | serNum | Serial number of the helix. This starts at 1 and | 479 | | | | increases incrementally. | 480 | 12 - 14 | LString(3) | helixID | Helix identifier. In addition to a serial number, | 481 | | | | each helix is given an alphanumeric character | 482 | | | | helix identifier. | 483 | 16 - 18 | Residue name | initResName | Name of the initial residue. | 484 | 20 | Character | initChainID | Chain identifier for the chain containing this | 485 | | | | helix. | 486 | 22 - 25 | Integer | initSeqNum | Sequence number of the initial residue. | 487 | 26 | AChar | initICode | Insertion code of the initial residue. | 488 | 28 - 30 | Residue name | endResName | Name of the terminal residue of the helix. | 489 | 32 | Character | endChainID | Chain identifier for the chain containing this | 490 | | | | helix. | 491 | 34 - 37 | Integer | endSeqNum | Sequence number of the terminal residue. | 492 | 38 | AChar | endICode | Insertion code of the terminal residue. | 493 | 39 - 40 | Integer | helixClass | Helix class (see below). | 494 | 41 - 70 | String | comment | Comment about this helix. | 495 | 72 - 76 | Integer | length | Length of this helix. | 496 |_________|______________|______________|____________________________________________________| 497 498 499 Details 500 ------- 501 502 Additional HELIX records with different serial numbers and identifiers occur if more than one helix is present. 503 504 The initial residue of the helix is the N-terminal residue. 505 506 Helices are classified as follows:: 507 508 _____________________________________________________ 509 | | CLASS NUMBER | 510 | TYPE OF HELIX | (COLUMNS 39 - 40) | 511 |_______________________________|___________________| 512 | | | 513 | Right-handed alpha (default) | 1 | 514 | Right-handed omega | 2 | 515 | Right-handed pi | 3 | 516 | Right-handed gamma | 4 | 517 | Right-handed 3 - 10 | 5 | 518 | Left-handed alpha | 6 | 519 | Left-handed omega | 7 | 520 | Left-handed gamma | 8 | 521 | 2 - 7 ribbon/helix | 9 | 522 | Polyproline | 10 | 523 |_______________________________|___________________| 524 525 526 Relationships to Other Record Types 527 ----------------------------------- 528 529 There may be related information in the REMARKs. 530 531 532 Example 533 ------- 534 535 Example 1:: 536 537 1 2 3 4 5 6 7 8 538 12345678901234567890123456789012345678901234567890123456789012345678901234567890 539 HELIX 1 HA GLY A 86 GLY A 94 1 9 540 HELIX 2 HB GLY B 86 GLY B 94 1 9 541 542 HELIX 21 21 PRO J 385 LEU J 388 5 4 543 HELIX 22 22 PHE J 397 PHE J 402 5 6 544 545 546 @param record: The PDB HELIX record. 547 @type record: str 548 @return: The record name, helix serial number, helix identifier, name of the initial residue, chain identifier, sequence number of the initial residue, insertion code of the initial residue, name of the terminal residue, chain identifier, sequence number of the terminal residue, insertion code of the terminal residue, helix class, comment, helix length. 549 @rtype: tuple of str, int, str, str, str, int, str, str, str, int, str, int, str, int 550 """ 551 552 # Initialise. 553 fields = [] 554 555 # Split up the record. 556 fields.append(record[0:6]) 557 fields.append(record[7:10]) 558 fields.append(record[11:14]) 559 fields.append(record[15:18]) 560 fields.append(record[19]) 561 fields.append(record[21:25]) 562 fields.append(record[25]) 563 fields.append(record[27:30]) 564 fields.append(record[31]) 565 fields.append(record[33:37]) 566 fields.append(record[37]) 567 fields.append(record[38:40]) 568 fields.append(record[40:70]) 569 fields.append(record[71:76]) 570 571 # Loop over the fields. 572 for i in range(len(fields)): 573 # Strip all whitespace. 574 fields[i] = fields[i].strip() 575 576 # Replace nothingness with None. 577 if fields[i] == '': 578 fields[i] = None 579 580 # Convert strings to numbers. 581 if fields[1]: 582 fields[1] = int(fields[1]) 583 if fields[5]: 584 fields[5] = int(fields[5]) 585 if fields[9]: 586 fields[9] = int(fields[9]) 587 if fields[11]: 588 fields[11] = int(fields[11]) 589 if fields[13]: 590 fields[13] = int(fields[13]) 591 592 # Return the data. 593 return tuple(fields)

594 595

596 -def het(record):

597 """Parse the HET record. 598 599 The following is the PDB v3.3 documentation U{http://www.wwpdb.org/documentation/file-format/format33/sect4.html#HET}. 600 601 HET 602 === 603 604 Overview 605 -------- 606 607 HET records are used to describe non-standard residues, such as prosthetic groups, inhibitors, solvent molecules, and ions for which coordinates are supplied. Groups are considered HET if they are not part of a biological polymer described in SEQRES and considered to be a molecule bound to the polymer, or they are a chemical species that constitute part of a biological polymer and is not one of the following: 608 609 - standard amino acids, or 610 - standard nucleic acids (C, G, A, U, I, DC, DG, DA, DU, DT and DI), or 611 - unknown amino acid (UNK) or nucleic acid (N) where UNK and N are used to indicate the unknown residue name. 612 613 HET records also describe chemical components for which the chemical identity is unknown, in which case the group is assigned the hetID UNL (Unknown Ligand). 614 615 The heterogen section of a PDB formatted file contains the complete description of non-standard residues in the entry. 616 617 618 Record Format 619 ------------- 620 621 The format is:: 622 ______________________________________________________________________________________________ 623 | | | | | 624 | Columns | Data type | Field | Definition | 625 |_________|______________|______________|____________________________________________________| 626 | | | | | 627 | 1 - 6 | Record name | "HET " | | 628 | 8 - 10 | LString(3) | hetID | Het identifier, right-justified. | 629 | 13 | Character | ChainID | Chain identifier. | 630 | 14 - 17 | Integer | seqNum | Sequence number. | 631 | 18 | AChar | iCode | Insertion code. | 632 | 21 - 25 | Integer | numHetAtoms | Number of HETATM records for the group present in | 633 | | | | the entry. | 634 | 31 - 70 | String | text | Text describing Het group. | 635 |_________|______________|______________|____________________________________________________| 636 637 638 Details 639 ------- 640 641 Each HET group is assigned a hetID of not more than three (3) alphanumeric characters. The sequence number, chain identifier, insertion code, and number of coordinate records are given for each occurrence of the HET group in the entry. The chemical name of the HET group is given in the HETNAM record and synonyms for the chemical name are given in the HETSYN records, see U{ftp://ftp.wwpdb.org/pub/pdb/data/monomers}. 642 643 There is a separate HET record for each occurrence of the HET group in an entry. 644 645 A particular HET group is represented in the PDB archive with a unique hetID. 646 647 PDB entries do not have HET records for water molecules, deuterated water, or methanol (when used as solvent). 648 649 Unknown atoms or ions will be represented as UNX with the chemical formula X1. Unknown ligands are UNL; unknown amino acids are UNK. 650 651 652 Verification/Validation/Value Authority Control 653 ----------------------------------------------- 654 655 For each het group that appears in the entry, the wwPDB checks that the corresponding HET, HETNAM, HETSYN, FORMUL, HETATM, and CONECT records appear, if applicable. The HET record is generated automatically using the Chemical Component Dictionary and information from the HETATM records. 656 657 Each unique hetID represents a unique molecule. 658 659 660 Relationships to Other Record Types 661 ----------------------------------- 662 663 For each het group that appears in the entry, there must be corresponding HET, HETNAM, HETSYN, FORMUL,HETATM, and CONECT records. LINK records may also be created. 664 665 666 Example 667 ------- 668 669 Example 1:: 670 671 1 2 3 4 5 6 7 8 672 12345678901234567890123456789012345678901234567890123456789012345678901234567890 673 HET TRS B 975 8 674 675 HET UDP A1457 25 676 HET B3P A1458 19 677 678 HET NAG Y 3 15 679 HET FUC Y 4 10 680 HET NON Y 5 12 681 HET UNK A 161 1 682 683 684 @param record: The PDB HET record. 685 @type record: str 686 @raises RelaxImplementError: Always. 687 """ 688 689 # Not implemented yet. 690 raise RelaxImplementError('het')

691 692

693 -def hetatm(record):

694 """Parse the HETATM record. 695 696 The following is the PDB v3.3 documentation U{http://www.wwpdb.org/documentation/file-format/format33/sect9.html#HETATM}. 697 698 HETATM 699 ====== 700 701 Overview 702 -------- 703 704 Non-polymer or other "non-standard" chemical coordinates, such as water molecules or atoms presented in HET groups use the HETATM record type. They also present the occupancy and temperature factor for each atom. The ATOM records present the atomic coordinates for standard residues. The element symbol is always present on each HETATM record; charge is optional. 705 706 Changes in ATOM/HETATM records will require standardization in atom and residue nomenclature. This nomenclature is described in the Chemical Component Dictionary, U{ftp://ftp.wwpdb.org/pub/pdb/data/monomers}. 707 708 709 Record Format 710 ------------- 711 712 The format is:: 713 ______________________________________________________________________________________________ 714 | | | | | 715 | Columns | Data type | Field | Definition | 716 |_________|______________|______________|____________________________________________________| 717 | | | | | 718 | 1 - 6 | Record name | "HETATM" | | 719 | 7 - 11 | Integer | serial | Atom serial number. | 720 | 13 - 16 | Atom | name | Atom name. | 721 | 17 | Character | altLoc | Alternate location indicator. | 722 | 18 - 20 | Residue name | resName | Residue name. | 723 | 22 | Character | chainID | Chain identifier. | 724 | 23 - 26 | Integer | resSeq | Residue sequence number. | 725 | 27 | AChar | iCode | Code for insertion of residues. | 726 | 31 - 38 | Real(8.3) | x | Orthogonal coordinates for X. | 727 | 39 - 46 | Real(8.3) | y | Orthogonal coordinates for Y. | 728 | 47 - 54 | Real(8.3) | z | Orthogonal coordinates for Z. | 729 | 55 - 60 | Real(6.2) | occupancy | Occupancy. | 730 | 61 - 66 | Real(6.2) | tempFactor | Temperature factor. | 731 | 77 - 78 | LString(2) | element | Element symbol; right-justified. | 732 | 79 - 80 | LString(2) | charge | Charge on the atom. | 733 |_________|______________|______________|____________________________________________________| 734 735 736 Details 737 ------- 738 739 The x, y, z coordinates are in Angstrom units. 740 741 No ordering is specified for polysaccharides. 742 743 See the HET section of this document regarding naming of heterogens. See the Chemical Component Dictionary for residue names, formulas, and topology of the HET groups that have appeared so far in the PDB (see U{ftp://ftp.wwpdb.org/pub/pdb/data/monomers}). 744 745 If the depositor provides the data, then the isotropic B value is given for the temperature factor. 746 747 If there are neither isotropic B values provided by the depositor, nor anisotropic temperature factors in ANISOU, then the default value of 0.0 is used for the temperature factor. 748 749 Insertion codes and element naming are fully described in the ATOM section of this document. 750 751 752 Verification/Validation/Value Authority Control 753 ----------------------------------------------- 754 755 Processing programs check ATOM/HETATM records for PDB file format, sequence information, and packing. 756 757 758 Relationships to Other Record Types 759 ----------------------------------- 760 761 HETATM records must have corresponding HET, HETNAM, FORMUL and CONECT records, except for waters. 762 763 764 Example 765 ------- 766 767 Example 1:: 768 769 1 2 3 4 5 6 7 8 770 12345678901234567890123456789012345678901234567890123456789012345678901234567890 771 HETATM 8237 MG MG A1001 13.872 -2.555 -29.045 1.00 27.36 MG 772 773 HETATM 3835 FE HEM A 1 17.140 3.115 15.066 1.00 14.14 FE 774 HETATM 8238 S SO4 A2001 10.885 -15.746 -14.404 1.00 47.84 S 775 HETATM 8239 O1 SO4 A2001 11.191 -14.833 -15.531 1.00 50.12 O 776 HETATM 8240 O2 SO4 A2001 9.576 -16.338 -14.706 1.00 48.55 O 777 HETATM 8241 O3 SO4 A2001 11.995 -16.703 -14.431 1.00 49.88 O 778 HETATM 8242 O4 SO4 A2001 10.932 -15.073 -13.100 1.00 49.91 O 779 780 781 @param record: The PDB HETATM record. 782 @type record: str 783 @return: The atom serial number, atom name, alternate location indicator, residue name, chain identifier, sequence number, insertion code, orthogonal coordinates for X in Angstroms, orthogonal coordinates for Y in Angstroms, orthogonal coordinates for Z in Angstroms, occupancy, temperature factor, element symbol, charge on the atom. 784 @rtype: tuple of int, str, str, str, str, int, str, float, float, float, float, float, str, int 785 """ 786 787 # Initialise. 788 fields = [] 789 790 # Split up the record. 791 fields.append(record[0:6]) 792 fields.append(record[6:11]) 793 fields.append(record[12:16]) 794 fields.append(record[16]) 795 fields.append(record[17:20]) 796 fields.append(record[21]) 797 fields.append(record[22:26]) 798 fields.append(record[26]) 799 fields.append(record[30:38]) 800 fields.append(record[38:46]) 801 fields.append(record[46:54]) 802 fields.append(record[54:60]) 803 fields.append(record[60:66]) 804 fields.append(record[76:78]) 805 fields.append(record[78:80]) 806 807 # Loop over the fields. 808 for i in range(len(fields)): 809 # Strip all whitespace. 810 fields[i] = fields[i].strip() 811 812 # Replace nothingness with None. 813 if fields[i] == '': 814 fields[i] = None 815 816 # Convert strings to numbers. 817 if fields[1]: 818 fields[1] = int(fields[1]) 819 if fields[6]: 820 fields[6] = int(fields[6]) 821 if fields[8]: 822 fields[8] = float(fields[8]) 823 if fields[9]: 824 fields[9] = float(fields[9]) 825 if fields[10]: 826 fields[10] = float(fields[10]) 827 if fields[11]: 828 fields[11] = float(fields[11]) 829 if fields[12]: 830 fields[12] = float(fields[12]) 831 832 # Return the data. 833 return tuple(fields)

834 835

836 -def hetnam(record):

837 """Parse the HETNAM record. 838 839 The following is the PDB v3.3 documentation U{http://www.wwpdb.org/documentation/file-format/format33/sect4.html#HETNAM}. 840 841 HETNAM 842 ====== 843 844 Overview 845 -------- 846 847 This record gives the chemical name of the compound with the given hetID. 848 849 850 Record Format 851 ------------- 852 853 The format is:: 854 ______________________________________________________________________________________________ 855 | | | | | 856 | Columns | Data type | Field | Definition | 857 |_________|______________|______________|____________________________________________________| 858 | | | | | 859 | 1 - 6 | Record name | "HETNAM" | | 860 | 9 - 10 | Continuation | continuation | Allows concatenation of multiple records. | 861 | 12 - 14 | LString(3) | hetID | Het identifier, right-justified. | 862 | 16 - 70 | String | text | Chemical name. | 863 |_________|______________|______________|____________________________________________________| 864 865 866 Details 867 ------- 868 869 Each hetID is assigned a unique chemical name for the HETNAM record, see U{ftp://ftp.wwpdb.org/pub/pdb/data/monomers}. 870 871 Other names for the group are given on HETSYN records. 872 873 PDB entries follow IUPAC/IUB naming conventions to describe groups systematically. 874 875 The special character "~" is used to indicate superscript in a heterogen name. For example: N6 will be listed in the HETNAM section as N~6~, with the ~ character indicating both the start and end of the superscript in the name, e.g.: 876 877 - N-(BENZYLSULFONYL)SERYL-N~1~-{4-[AMINO(IMINO)METHYL]BENZYL}GLYCINAMIDE 878 879 Continuation of chemical names onto subsequent records is allowed. 880 881 Only one HETNAM record is included for a given hetID, even if the same hetID appears on more than one HET record. 882 883 884 Verification/Validation/Value Authority Control 885 ----------------------------------------------- 886 887 For each het group that appears in the entry, the corresponding HET, HETNAM, FORMUL, HETATM, and CONECT records must appear. The HETNAM record is generated automatically using the Chemical Component Dictionary and information from HETATM records. 888 889 890 Relationships to Other Record Types 891 ----------------------------------- 892 893 For each het group that appears in the entry, there must be corresponding HET, HETNAM, FORMUL, HETATM, and CONECT records. HETSYN and LINK records may also be created. 894 895 896 Example 897 ------- 898 899 Example 1:: 900 901 1 2 3 4 5 6 7 8 902 12345678901234567890123456789012345678901234567890123456789012345678901234567890 903 HETNAM NAG N-ACETYL-D-GLUCOSAMINE 904 HETNAM SAD BETA-METHYLENE SELENAZOLE-4-CARBOXAMIDE ADENINE 905 HETNAM 2 SAD DINUCLEOTIDE 906 907 HETNAM UDP URIDINE-5'-DIPHOSPHATE 908 909 HETNAM UNX UNKNOWN ATOM OR ION 910 HETNAM UNL UNKNOWN LIGAND 911 912 HETNAM B3P 2-[3-(2-HYDROXY-1,1-DIHYDROXYMETHYL-ETHYLAMINO)- 913 HETNAM 2 B3P PROPYLAMINO]-2-HYDROXYMETHYL-PROPANE-1,3-DIOL 914 915 916 @param record: The PDB HETNAM record. 917 @type record: str 918 @raises RelaxImplementError: Always. 919 """ 920 921 # Not implemented yet. 922 raise RelaxImplementError('hetnam')

923 924

925 -def model(record):

926 """Parse the MODEL record. 927 928 The following is the PDB v3.3 documentation U{http://www.wwpdb.org/documentation/file-format/format33/sect9.html#MODEL}. 929 930 MODEL 931 ===== 932 933 Overview 934 -------- 935 936 The MODEL record specifies the model serial number when multiple models of the same structure are presented in a single coordinate entry, as is often the case with structures determined by NMR. 937 938 939 Record Format 940 ------------- 941 942 The format is:: 943 ______________________________________________________________________________________________ 944 | | | | | 945 | Columns | Data type | Field | Definition | 946 |_________|______________|______________|____________________________________________________| 947 | | | | | 948 | 1 - 6 | Record name | "MODEL " | | 949 | 11 - 14 | Integer | serial | Model serial number. | 950 |_________|______________|______________|____________________________________________________| 951 952 953 Details 954 ------- 955 956 This record is used only when more than one model appears in an entry. Generally, it is employed mainly for NMR structures. The chemical connectivity should be the same for each model. ATOM, HETATM, ANISOU, and TER records for each model structure and are interspersed as needed between MODEL and ENDMDL records. 957 958 The numbering of models is sequential, beginning with 1. 959 960 All models in a deposition should be superimposed in an appropriate author determined manner and only one superposition method should be used. Structures from different experiments, or different domains of a structure should not be superimposed and deposited as models of a deposition. 961 962 All models in an NMR ensemble should be homogeneous - each model should have the exact same atoms (hydrogen and heavy atoms), sequence and chemistry. 963 964 All models in an NMR entry should have hydrogen atoms. 965 966 Deposition of minimized average structure must be accompanied with ensemble and must be homogeneous with ensemble. 967 968 A model cannot have more than 99,999 atoms. Where the entry does not contain an ensemble of models, then the entry cannot have more than 99,999 atoms. Entries that go beyond this atom limit must be split into multiple entries, each containing no more than the limits specified above. 969 970 971 Verification/Validation/Value Authority Control 972 ----------------------------------------------- 973 974 Entries with multiple models in the NUMMDL record are checked for corresponding pairs of MODEL/ ENDMDL records, and for consecutively numbered models. 975 976 977 Relationships to Other Record Types 978 ----------------------------------- 979 980 Each MODEL must have a corresponding ENDMDL record. 981 982 983 Examples 984 -------- 985 986 Example 1:: 987 988 1 2 3 4 5 6 7 8 989 12345678901234567890123456789012345678901234567890123456789012345678901234567890 990 MODEL 1 991 ATOM 1 N ALA A 1 11.104 6.134 -6.504 1.00 0.00 N 992 ATOM 2 CA ALA A 1 11.639 6.071 -5.147 1.00 0.00 C 993 ... 994 ... 995 ... 996 ATOM 293 1HG GLU A 18 -14.861 -4.847 0.361 1.00 0.00 H 997 ATOM 294 2HG GLU A 18 -13.518 -3.769 0.084 1.00 0.00 H 998 TER 295 GLU A 18 999 ENDMDL 1000 MODEL 2 1001 ATOM 296 N ALA A 1 10.883 6.779 -6.464 1.00 0.00 N 1002 ATOM 297 CA ALA A 1 11.451 6.531 -5.142 1.00 0.00 C 1003 ... 1004 ... 1005 ATOM 588 1HG GLU A 18 -13.363 -4.163 -2.372 1.00 0.00 H 1006 ATOM 589 2HG GLU A 18 -12.634 -3.023 -3.475 1.00 0.00 H 1007 TER 590 GLU A 18 1008 ENDMDL 1009 1010 Example 2:: 1011 1012 1 2 3 4 5 6 7 8 1013 12345678901234567890123456789012345678901234567890123456789012345678901234567890 1014 MODEL 1 1015 ATOM 1 N AALA A 1 72.883 57.697 56.410 0.50 83.80 N 1016 ATOM 2 CA AALA A 1 73.796 56.531 56.644 0.50 84.78 C 1017 ATOM 3 C AALA A 1 74.549 56.551 57.997 0.50 85.05 C 1018 ATOM 4 O AALA A 1 73.951 56.413 59.075 0.50 84.77 O 1019 ... 1020 ... 1021 ... 1022 HETATM37900 O AHOH 490 -24.915 147.513 36.413 0.50 41.86 O 1023 HETATM37901 O AHOH 491 -28.699 130.471 22.248 0.50 36.06 O 1024 HETATM37902 O AHOH 492 -33.309 184.488 26.176 0.50 15.00 O 1025 ENDMDL 1026 MODEL 2 1027 ATOM 1 N BALA A 1 72.883 57.697 56.410 0.50 83.80 N 1028 ATOM 2 CA BALA A 1 73.796 56.531 56.644 0.50 84.78 C 1029 ATOM 3 C BALA A 1 74.549 56.551 57.997 0.50 85.05 C 1030 ATOM 4 O BALA A 1 73.951 56.413 59.075 0.50 84.77 O 1031 ATOM 5 CB BALA A 1 74.804 56.369 55.453 0.50 84.29 C 1032 ATOM 6 N BASP A 2 75.872 56.703 57.905 0.50 85.59 N 1033 ATOM 7 CA BASP A 2 76.801 56.651 59.048 0.50 85.67 C 1034 ATOM 8 C BASP A 2 76.283 57.361 60.309 0.50 84.80 C 1035 ... 1036 1037 1038 @param record: The PDB MODEL record. 1039 @type record: str 1040 @raises RelaxImplementError: Always. 1041 """ 1042 1043 # Not implemented yet. 1044 raise RelaxImplementError('model')

1045 1046

1047 -def remark(record):

1048 """Parse the REMARK record. 1049 1050 The following is the PDB v3.3 documentation U{http://www.wwpdb.org/documentation/file-format/format33/remarks.html}. 1051 1052 REMARK 1053 ====== 1054 1055 Overview 1056 -------- 1057 1058 REMARK records present experimental details, annotations, comments, and information not included in other records. In a number of cases, REMARKs are used to expand the contents of other record types. A new level of structure is being used for some REMARK records. This is expected to facilitate searching and will assist in the conversion to a relational database. 1059 1060 The very first line of every set of REMARK records is used as a spacer to aid in reading:: 1061 1062 ______________________________________________________________________________________________ 1063 | | | | | 1064 | Columns | Data type | Field | Definition | 1065 |_________|_____________|_____________|______________________________________________________| 1066 | | | | | 1067 | 1 - 6 | Record name | "REMARK" | | 1068 | 8 - 10 | Integer | remarkNum | Remark number. It is not an error for remark n to | 1069 | | | | exist in an entry when remark n-1 does not. | 1070 | 12 - 79 | LString | empty | Left as white space in first line of each new | 1071 | | | | remark. | 1072 |_________|_____________|_____________|______________________________________________________| 1073 1074 1075 @param record: The PDB REMARK record. 1076 @type record: str 1077 @raises RelaxImplementError: Always. 1078 """ 1079 1080 # Not implemented yet. 1081 raise RelaxImplementError('remark')

1082 1083

1084 -def sheet(record):

1085 """Parse the SHEET record. 1086 1087 The following is the PDB v3.3 documentation U{http://www.wwpdb.org/documentation/file-format/format33/sect5.html#SHEET}. 1088 1089 SHEET 1090 ===== 1091 1092 Overview 1093 -------- 1094 1095 SHEET records are used to identify the position of sheets in the molecule. Sheets are both named and numbered. The residues where the sheet begins and ends are noted. 1096 1097 1098 Record Format 1099 ------------- 1100 1101 The format is:: 1102 ______________________________________________________________________________________________ 1103 | | | | | 1104 | Columns | Data type | Field | Definition | 1105 |_________|______________|______________|____________________________________________________| 1106 | | | | | 1107 | 1 - 6 | Record name | "SHEET " | | 1108 | 8 - 10 | Integer | strand | Strand number which starts at 1 for each strand | 1109 | | | | within a sheet and increases by one. | 1110 | 12 - 14 | LString(3) | sheetID | Sheet identifier. | 1111 | 15 - 16 | Integer | numStrands | Number of strands in sheet. | 1112 | 18 - 20 | Residue name | initResName | Residue name of initial residue. | 1113 | 22 | Character | initChainID | Chain identifier of initial residue in strand. | 1114 | 23 - 26 | Integer | initSeqNum | Sequence number of initial residue in strand. | 1115 | 27 | AChar | initICode | Insertion code of initial residue in strand. | 1116 | 29 - 31 | Residue name | endResName | Residue name of terminal residue. | 1117 | 33 | Character | endChainID | Chain identifier of terminal residue. | 1118 | 34 - 37 | Integer | endSeqNum | Sequence number of terminal residue. | 1119 | 38 | AChar | endICode | Insertion code of terminal residue. | 1120 | 39 - 40 | Integer | sense | Sense of strand with respect to previous strand in | 1121 | | | | the sheet. 0 if first strand, 1 if parallel, and | 1122 | | | | -1 if anti-parallel. | 1123 | 42 - 45 | Atom | curAtom | Registration. Atom name in current strand. | 1124 | 46 - 48 | Residue name | curResName | Registration. Residue name in current strand. | 1125 | 50 | Character | curChainId | Registration. Chain identifier in current strand. | 1126 | 51 - 54 | Integer | curResSeq | Registration. Residue sequence number in current | 1127 | | | | strand. | 1128 | 55 | AChar | curICode | Registration. Insertion code in current strand. | 1129 | 57 - 60 | Atom | prevAtom | Registration. Atom name in previous strand. | 1130 | 61 - 63 | Residue name | prevResName | Registration. Residue name in previous strand. | 1131 | 65 | Character | prevChainId | Registration. Chain identifier in previous strand.| 1132 | 66 - 69 | Integer | prevResSeq | Registration. Residue sequence number in previous | 1133 | | | | strand. | 1134 | 70 | AChar | prevICode | Registration. Insertion code in previous strand. | 1135 |_________|______________|______________|____________________________________________________| 1136 1137 1138 Details 1139 ------- 1140 1141 The initial residue for a strand is its N-terminus. Strand registration information is provided in columns 39 - 70. Strands are listed starting with one edge of the sheet and continuing to the spatially adjacent strand. 1142 1143 The sense in columns 39 - 40 indicates whether strand n is parallel (sense = 1) or anti-parallel (sense = -1) to strand n-1. Sense is equal to zero (0) for the first strand of a sheet. 1144 1145 The registration (columns 42 - 70) of strand n to strand n-1 may be specified by one hydrogen bond between each such pair of strands. This is done by providing the hydrogen bonding between the current and previous strands. No register information should be provided for the first strand. 1146 1147 Split strands, or strands with two or more runs of residues from discontinuous parts of the amino acid sequence, are explicitly listed. Detail description can be included in the REMARK 700. 1148 1149 1150 Relationships to Other Record Types 1151 ----------------------------------- 1152 1153 If the entry contains bifurcated sheets or beta-barrels, the relevant REMARK 700 records must be provided. See the REMARK section for details. 1154 1155 1156 Examples 1157 -------- 1158 1159 Example 1:: 1160 1161 1 2 3 4 5 6 7 8 1162 12345678901234567890123456789012345678901234567890123456789012345678901234567890 1163 SHEET 1 A 5 THR A 107 ARG A 110 0 1164 SHEET 2 A 5 ILE A 96 THR A 99 -1 N LYS A 98 O THR A 107 1165 SHEET 3 A 5 ARG A 87 SER A 91 -1 N LEU A 89 O TYR A 97 1166 SHEET 4 A 5 TRP A 71 ASP A 75 -1 N ALA A 74 O ILE A 88 1167 SHEET 5 A 5 GLY A 52 PHE A 56 -1 N PHE A 56 O TRP A 71 1168 SHEET 1 B 5 THR B 107 ARG B 110 0 1169 SHEET 2 B 5 ILE B 96 THR B 99 -1 N LYS B 98 O THR B 107 1170 SHEET 3 B 5 ARG B 87 SER B 91 -1 N LEU B 89 O TYR B 97 1171 SHEET 4 B 5 TRP B 71 ASP B 75 -1 N ALA B 74 O ILE B 88 1172 SHEET 5 B 5 GLY B 52 ILE B 55 -1 N ASP B 54 O GLU B 73 1173 1174 The sheet presented as BS1 below is an eight-stranded beta-barrel. This is represented by a nine-stranded sheet in which the first and last strands are identical:: 1175 1176 SHEET 1 BS1 9 VAL 13 ILE 17 0 1177 SHEET 2 BS1 9 ALA 70 ILE 73 1 O TRP 72 N ILE 17 1178 SHEET 3 BS1 9 LYS 127 PHE 132 1 O ILE 129 N ILE 73 1179 SHEET 4 BS1 9 GLY 221 ASP 225 1 O GLY 221 N ILE 130 1180 SHEET 5 BS1 9 VAL 248 GLU 253 1 O PHE 249 N ILE 222 1181 SHEET 6 BS1 9 LEU 276 ASP 278 1 N LEU 277 O GLY 252 1182 SHEET 7 BS1 9 TYR 310 THR 318 1 O VAL 317 N ASP 278 1183 SHEET 8 BS1 9 VAL 351 TYR 356 1 O VAL 351 N THR 318 1184 SHEET 9 BS1 9 VAL 13 ILE 17 1 N VAL 14 O PRO 352 1185 1186 The sheet structure of this example is bifurcated. In order to represent this feature, two sheets are defined. Strands 2 and 3 of BS7 and BS8 are identical:: 1187 1188 SHEET 1 BS7 3 HIS 662 THR 665 0 1189 SHEET 2 BS7 3 LYS 639 LYS 648 -1 N PHE 643 O HIS 662 1190 SHEET 3 BS7 3 ASN 596 VAL 600 -1 N TYR 598 O ILE 646 1191 SHEET 1 BS8 3 ASN 653 TRP 656 0 1192 SHEET 2 BS8 3 LYS 639 LYS 648 -1 N LYS 647 O THR 655 1193 SHEET 3 BS8 3 ASN 596 VAL 600 -1 N TYR 598 O ILE 646 1194 1195 1196 @param record: The PDB SHEET record. 1197 @type record: str 1198 @return: The record name, strand number, sheet identifier, number of strands in sheet, residue name of initial residue, chain identifier of initial residue in strand, sequence number of initial residue in strand, insertion code of initial residue in strand, residue name of terminal residue, chain identifier of terminal residue, sequence number of terminal residue, insertion code of terminal residue, sense of strand with respect to previous strand, atom name in current strand, residue name in current strand, chain identifier in current strand, residue sequence number in current strand, insertion code in current strand, atom name in previous strand, residue name in previous strand, chain identifier in previous strand, residue sequence number in previous strand, insertion code in previous strand. 1199 @rtype: tuple of str, int, str, int, str, str, int, str, str, str, int, str, int, str, str, str, int, str, str, str, str, int, str 1200 """ 1201 1202 # Initialise. 1203 fields = [] 1204 1205 # Split up the record. 1206 fields.append(record[0:6]) 1207 fields.append(record[7:10]) 1208 fields.append(record[11:14]) 1209 fields.append(record[14:16]) 1210 fields.append(record[17:20]) 1211 fields.append(record[21]) 1212 fields.append(record[22:26]) 1213 fields.append(record[26]) 1214 fields.append(record[28:31]) 1215 fields.append(record[32]) 1216 fields.append(record[33:37]) 1217 fields.append(record[37]) 1218 fields.append(record[38:40]) 1219 fields.append(record[41:45]) 1220 fields.append(record[45:48]) 1221 fields.append(record[49]) 1222 fields.append(record[50:54]) 1223 fields.append(record[54]) 1224 fields.append(record[56:60]) 1225 fields.append(record[60:63]) 1226 fields.append(record[64]) 1227 fields.append(record[65:69]) 1228 fields.append(record[69]) 1229 1230 # Loop over the fields. 1231 for i in range(len(fields)): 1232 # Strip all whitespace. 1233 fields[i] = fields[i].strip() 1234 1235 # Replace nothingness with None. 1236 if fields[i] == '': 1237 fields[i] = None 1238 1239 # Convert strings to numbers. 1240 if fields[1]: 1241 fields[1] = int(fields[1]) 1242 if fields[3]: 1243 fields[3] = int(fields[3]) 1244 if fields[6]: 1245 fields[6] = int(fields[6]) 1246 if fields[10]: 1247 fields[10] = int(fields[10]) 1248 if fields[12]: 1249 fields[12] = int(fields[12]) 1250 if fields[16]: 1251 fields[16] = int(fields[16]) 1252 if fields[21]: 1253 fields[21] = int(fields[21]) 1254 1255 # Return the data. 1256 return tuple(fields)

1257 1258

1259 -def ter(record):

1260 """Parse the TER record. 1261 1262 The following is the PDB v3.3 documentation U{http://www.wwpdb.org/documentation/file-format/format33/sect9.html#TER}. 1263 1264 TER 1265 === 1266 1267 Overview 1268 -------- 1269 1270 The TER record indicates the end of a list of ATOM/HETATM records for a chain. 1271 1272 1273 Record Format 1274 ------------- 1275 1276 The format is:: 1277 ______________________________________________________________________________________________ 1278 | | | | | 1279 | Columns | Data type | Field | Definition | 1280 |_________|______________|______________|____________________________________________________| 1281 | | | | | 1282 | 1 - 6 | Record name | "TER " | | 1283 | 7 - 11 | Integer | serial | Serial number. | 1284 | 18 - 20 | Residue name | resName | Residue name. | 1285 | 22 | Character | chainID | Chain identifier. | 1286 | 23 - 26 | Integer | resSeq | Residue sequence number. | 1287 | 27 | AChar | iCode | Insertion code. | 1288 |_________|______________|______________|____________________________________________________| 1289 1290 1291 Details 1292 ------- 1293 1294 Every chain of ATOM/HETATM records presented on SEQRES records is terminated with a TER record. 1295 1296 The TER records occur in the coordinate section of the entry, and indicate the last residue presented for each polypeptide and/or nucleic acid chain for which there are determined coordinates. For proteins, the residue defined on the TER record is the carboxy-terminal residue; for nucleic acids it is the 3'-terminal residue. 1297 1298 For a cyclic molecule, the choice of termini is arbitrary. 1299 1300 Terminal oxygen atoms are presented as OXT for proteins, and as O5' or OP3 for nucleic acids. These atoms are present only if the last residue in the polymer is truly the last residue in the SEQRES. 1301 1302 The TER record has the same residue name, chain identifier, sequence number and insertion code as the terminal residue. The serial number of the TER record is one number greater than the serial number of the ATOM/HETATM preceding the TER. 1303 1304 1305 Verification/Validation/Value Authority Control 1306 ----------------------------------------------- 1307 1308 TER must appear at the terminal carboxyl end or 3' end of a chain. For proteins, there is usually a terminal oxygen, labeled OXT. The validation program checks for the occurrence of TER and OXT records. 1309 1310 1311 Relationships to Other Record Types 1312 ----------------------------------- 1313 1314 The residue name appearing on the TER record must be the same as the residue name of the immediately preceding ATOM or non-water HETATM record. 1315 1316 1317 Example 1318 ------- 1319 1320 Example 1:: 1321 1322 1 2 3 4 5 6 7 8 1323 12345678901234567890123456789012345678901234567890123456789012345678901234567890 1324 ATOM 601 N LEU A 75 -17.070 -16.002 2.409 1.00 55.63 N 1325 ATOM 602 CA LEU A 75 -16.343 -16.746 3.444 1.00 55.50 C 1326 ATOM 603 C LEU A 75 -16.499 -18.263 3.300 1.00 55.55 C 1327 ATOM 604 O LEU A 75 -16.645 -18.789 2.195 1.00 55.50 O 1328 ATOM 605 CB LEU A 75 -16.776 -16.283 4.844 1.00 55.51 C 1329 TER 606 LEU A 75 1330 ... 1331 ATOM 1185 O LEU B 75 26.292 -4.310 16.940 1.00 55.45 O 1332 ATOM 1186 CB LEU B 75 23.881 -1.551 16.797 1.00 55.32 C 1333 TER 1187 LEU B 75 1334 HETATM 1188 H2 SRT A1076 -17.263 11.260 28.634 1.00 59.62 H 1335 HETATM 1189 HA SRT A1076 -19.347 11.519 28.341 1.00 59.42 H 1336 HETATM 1190 H3 SRT A1076 -17.157 14.303 28.677 1.00 58.00 H 1337 HETATM 1191 HB SRT A1076 -15.110 13.610 28.816 1.00 57.77 H 1338 HETATM 1192 O1 SRT A1076 -17.028 11.281 31.131 1.00 62.63 O 1339 1340 ATOM 295 HB2 ALA A 18 4.601 -9.393 7.275 1.00 0.00 H 1341 ATOM 296 HB3 ALA A 18 3.340 -9.147 6.043 1.00 0.00 H 1342 TER 297 ALA A 18 1343 ENDMDL 1344 1345 1346 @param record: The PDB TER record. 1347 @type record: str 1348 @raises RelaxImplementError: Always. 1349 """ 1350 1351 # Not implemented yet. 1352 raise RelaxImplementError('ter')

1353

Source Code for Module lib.structure.pdb_read