1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24 """Module containing the 'spectrum' user function class."""
25 __docformat__ = 'plaintext'
26
27
28 from base_class import User_fn_class, _build_doc
29 import arg_check
30 from generic_fns import spectrum
31
32
34 """Class for supporting the input of spectral data."""
35
52
53
54 baseplane_rmsd._doc_title = "Set the baseplane RMSD of a given spin in a spectrum for error analysis."
55 baseplane_rmsd._doc_title_short = "Baseplane RMSD setting."
56 baseplane_rmsd._doc_args = [
57 ["error", "The baseplane RMSD error value."],
58 ["spectrum_id", "The spectrum ID string."],
59 ["spin_id", "The spin ID string."]
60 ]
61 baseplane_rmsd._doc_desc = """
62 The spectrum ID identifies the spectrum associated with the error and must correspond to a previously loaded set of intensities. If the spin ID is unset, then the error value for all spins will be set to the supplied value.
63 """
64 _build_doc(baseplane_rmsd)
65
66
67 - def delete(self, spectrum_id=None):
68
69 if self._exec_info.intro:
70 text = self._exec_info.ps3 + "spectrum.delete("
71 text = text + "spectrum_id=" + repr(spectrum_id) + ")"
72 print(text)
73
74
75 arg_check.is_str(spectrum_id, 'spectrum ID string')
76
77
78 spectrum.delete(spectrum_id=spectrum_id)
79
80
81 delete._doc_title = "Delete the spectral data corresponding to the spectrum ID string."
82 delete._doc_title_short = "Spectral data deletion."
83 delete._doc_args = [
84 ["spectrum_id", "The unique spectrum ID string."]
85 ]
86 delete._doc_desc = """
87 The spectral data corresponding to the given spectrum ID string will be removed from the current data pipe.
88 """
89 delete._doc_examples = """
90 To delete the peak height data corresponding to the ID 'R1 ncyc5', type:
91
92 relax> spectrum.delete('R1 ncyc5')
93 """
94 _build_doc(delete)
95
96
105
106
107 error_analysis._doc_title = "Perform an error analysis for peak intensities."
108 error_analysis._doc_title_short = "Peak intensity error analysis."
109 error_analysis._doc_desc = """
110 This user function must only be called after all peak intensities have been loaded and all other necessary spectral information set. This includes the baseplane RMSD and the number of points used in volume integration, both of which are only used if spectra have not been replicated.
111
112 Six different types of error analysis are supported depending on whether peak heights or volumes are supplied, whether noise is determined from replicated spectra or the RMSD of the baseplane noise, and whether all spectra or only a subset have been duplicated. These are:
113
114 ____________________________________________________________________________________________
115 | | | |
116 | Int type | Noise source | Error scope |
117 |__________|________________________________________|______________________________________|
118 | | | |
119 | Heights | RMSD baseplane | One sigma per peak per spectrum |
120 | | | |
121 | Heights | Partial duplicate + variance averaging | One sigma for all peaks, all spectra |
122 | | | |
123 | Heights | All replicated + variance averaging | One sigma per replicated spectra set |
124 | | | |
125 | Volumes | RMSD baseplane | One sigma per peak per spectrum |
126 | | | |
127 | Volumes | Partial duplicate + variance averaging | One sigma for all peaks, all spectra |
128 | | | |
129 | Volumes | All replicated + variance averaging | One sigma per replicated spectra set |
130 |__________|________________________________________|______________________________________|
131 """
132 error_analysis._doc_additional = [
133 ["Peak heights with baseplane noise RMSD", """
134 When none of the spectra have been replicated, then the peak height errors are calculated using the RMSD of the baseplane noise, the value of which is set by the spectrum.baseplane_rmsd() user function. This results in a different error per peak per spectrum. The standard deviation error measure for the peak height, sigma_I, is set to the RMSD value."""],
135 ["Peak heights with partially replicated spectra", """
136 When spectra are replicated, the variance for a single spin at a single replicated spectra set is calculated by the formula
137
138 -----
139
140 sigma^2 = sum({Ii - Iav}^2) / (n - 1) ,
141
142 -----
143
144 where sigma^2 is the variance, sigma is the standard deviation, n is the size of the replicated spectra set with i being the corresponding index, Ii is the peak intensity for spectrum i, and Iav is the mean over all spectra i.e. the sum of all peak intensities divided by n.
145
146 As the value of n in the above equation is always very low since normally only a couple of spectra are collected per replicated spectra set, the variance of all spins is averaged for a single replicated spectra set. Although this results in all spins having the same error, the accuracy of the error estimate is significantly improved.
147
148 If there are in addition to the replicated spectra loaded peak intensities which only consist of a single spectrum, i.e. not all spectra are replicated, then the variances of replicated replicated spectra sets will be averaged. This will be used for the entire experiment so that there will be only a single error value for all spins and for all spectra."""],
149 ["Peak heights with all spectra replicated", """
150 If all spectra are collected in duplicate (triplicate or higher number of spectra are supported), the each replicated spectra set will have its own error estimate. The error for a single peak is calculated as when partially replicated spectra are collected, and these are again averaged to give a single error per replicated spectra set. However as all replicated spectra sets will have their own error estimate, variance averaging across all spectra sets will not be performed."""],
151 ["Peak volumes with baseplane noise RMSD", """
152 The method of error analysis when no spectra have been replicated and peak volumes are used is highly dependent on the integration method. Many methods simply sum the number of points within a fixed region, either a box or oval object. The number of points used, N, must be specified by another user function in this class. Then the error is simply given by the sum of variances:
153
154 -----
155
156 sigma_vol^2 = sigma_i^2 * N,
157
158 -----
159
160 where sigma_vol is the standard deviation of the volume, sigma_i is the standard deviation of a single point assumed to be equal to the RMSD of the baseplane noise, and N is the total number of points used in the summation integration method. For a box integration method, this converts to the Nicholson, Kay, Baldisseri, Arango, Young, Bax, and Torchia (1992) Biochemistry, 31: 5253-5263 equation:
161
162 -----
163
164 sigma_vol = sigma_i * sqrt(n*m),
165
166 -----
167
168 where n and m are the dimensions of the box. Note that a number of programs, for example peakint (http://hugin.ethz.ch/wuthrich/software/xeasy/xeasy_m15.html) does not use all points within the box. And if the number N can not be determined, this category of error analysis is not possible.
169
170 Also note that non-point summation methods, for example when line shape fitting is used to determine peak volumes, the equations above cannot be used. Hence again this category of error analysis cannot be used. This is the case for one of the three integration methods used by Sparky (http://www.cgl.ucsf.edu/home/sparky/manual/peaks.html#Integration). And if fancy techniques are used, for example as Cara does to deconvolute overlapping peaks (http://www.cara.ethz.ch/Wiki/Integration), this again makes this error analysis impossible."""],
171 ["Peak volumes with partially replicated spectra", """
172 When peak volumes are measured by any integration method and a few of the spectra are replicated, then the intensity errors are calculated identically as described in the 'Peak heights with partially replicated spectra' section above."""],
173 ["Peak volumes with all spectra replicated", """
174 With all spectra replicated and again using any integration methodology, the intensity errors can be calculated as described in the 'Peak heights with all spectra replicated' section above.
175 """]
176 ]
177 _build_doc(error_analysis)
178
179
181
182 if self._exec_info.intro:
183 text = self._exec_info.ps3 + "spectrum.integration_points("
184 text = text + "N=" + repr(N)
185 text = text + ", spectrum_id=" + repr(spectrum_id)
186 text = text + ", spin_id=" + repr(spin_id) + ")"
187 print(text)
188
189
190 arg_check.is_int(N, 'number of summed points')
191 arg_check.is_str(spectrum_id, 'spectrum ID string')
192 arg_check.is_str(spin_id, 'spin ID string', can_be_none=True)
193
194
195 spectrum.integration_points(N=N, spectrum_id=spectrum_id, spin_id=spin_id)
196
197
198 integration_points._doc_title = "Set the number of summed points used in volume integration of a given spin in a spectrum."
199 integration_points._doc_title_short = "Number of integration points."
200 integration_points._doc_args = [
201 ["N", "The number of points used by the summation volume integration method."],
202 ["spectrum_id", "The spectrum ID string."],
203 ["spin_id", "The spin ID string."]
204 ]
205 integration_points._doc_desc = """
206 For a complete description of which integration methods and how many points N are used for different integration techniques, please see the spectrum.error_analysis user function documentation.
207
208 The spectrum ID identifies the spectrum associated with the value of N and must correspond to a previously loaded set of intensities. If the spin ID is unset, then the number of summed points for all spins will be set to the supplied value.
209 """
210 _build_doc(integration_points)
211
212
213 - def read_intensities(self, file=None, dir=None, spectrum_id=None, heteronuc='N', proton='HN', int_method='height', int_col=None, spin_id_col=None, mol_name_col=None, res_num_col=None, res_name_col=None, spin_num_col=None, spin_name_col=None, sep=None, spin_id=None, ncproc=None):
214
215 if self._exec_info.intro:
216 text = self._exec_info.ps3 + "spectrum.read_intensities("
217 text = text + "file=" + repr(file)
218 text = text + ", dir=" + repr(dir)
219 text = text + ", spectrum_id=" + repr(spectrum_id)
220 text = text + ", heteronuc=" + repr(heteronuc)
221 text = text + ", proton=" + repr(proton)
222 text = text + ", int_method=" + repr(int_method)
223 text = text + ", int_col=" + repr(int_col)
224 text = text + ", spin_id_col=" + repr(spin_id_col)
225 text = text + ", mol_name_col=" + repr(mol_name_col)
226 text = text + ", res_num_col=" + repr(res_num_col)
227 text = text + ", res_name_col=" + repr(res_name_col)
228 text = text + ", spin_num_col=" + repr(spin_num_col)
229 text = text + ", spin_name_col=" + repr(spin_name_col)
230 text = text + ", sep=" + repr(sep)
231 text = text + ", spin_id=" + repr(spin_id)
232 text = text + ", ncproc=" + repr(ncproc) + ")"
233 print(text)
234
235
236 arg_check.is_str(file, 'file name')
237 arg_check.is_str(dir, 'directory name', can_be_none=True)
238 arg_check.is_str_or_str_list(spectrum_id, 'spectrum ID string')
239 arg_check.is_str(heteronuc, 'heteronucleus name')
240 arg_check.is_str(proton, 'proton name')
241 arg_check.is_str(int_method, 'integration method')
242 arg_check.is_int_or_int_list(int_col, 'intensity column', can_be_none=True)
243 arg_check.is_int(spin_id_col, 'spin ID string column', can_be_none=True)
244 arg_check.is_int(mol_name_col, 'molecule name column', can_be_none=True)
245 arg_check.is_int(res_num_col, 'residue number column', can_be_none=True)
246 arg_check.is_int(res_name_col, 'residue name column', can_be_none=True)
247 arg_check.is_int(spin_num_col, 'spin number column', can_be_none=True)
248 arg_check.is_int(spin_name_col, 'spin name column', can_be_none=True)
249 arg_check.is_str(sep, 'column separator', can_be_none=True)
250 arg_check.is_str(spin_id, 'spin ID string', can_be_none=True)
251 arg_check.is_int(ncproc, 'Bruker ncproc parameter', can_be_none=True)
252
253
254 spectrum.read(file=file, dir=dir, spectrum_id=spectrum_id, heteronuc=heteronuc, proton=proton, int_method=int_method, int_col=int_col, spin_id_col=spin_id_col, mol_name_col=mol_name_col, res_num_col=res_num_col, res_name_col=res_name_col, spin_num_col=spin_num_col, spin_name_col=spin_name_col, sep=sep, spin_id=spin_id, ncproc=ncproc)
255
256
257 read_intensities._doc_title = "Read peak intensities from a file."
258 read_intensities._doc_title_short = "Peak intensity reading."
259 read_intensities._doc_args = [
260 ["file", "The name of the file containing the intensity data."],
261 ["dir", "The directory where the file is located."],
262 ["spectrum_id", "The unique spectrum ID string."],
263 ["heteronuc", "The name of the heteronucleus as specified in the peak intensity file."],
264 ["proton", "The name of the proton as specified in the peak intensity file."],
265 ["int_method", "The integration method."],
266 ["int_col", "The optional column containing the peak intensity data (used by the generic intensity file format, or if the intensities are in a non-standard column)."],
267 ["spin_id_col", "The spin ID string column used by the generic intensity file format (an alternative to the mol, res, and spin name and number columns)."],
268 ["mol_name_col", "The molecule name column used by the generic intensity file format (alternative to the spin ID column)."],
269 ["res_num_col", "The residue number column used by the generic intensity file format (alternative to the spin ID column)."],
270 ["res_name_col", "The residue name column used by the generic intensity file format (alternative to the spin ID column)."],
271 ["spin_num_col", "The spin number column used by the generic intensity file format (alternative to the spin ID column)."],
272 ["spin_name_col", "The spin name column used by the generic intensity file format (alternative to the spin ID column)."],
273 ["sep", "The column separator used by the generic intensity format (the default is white space)."],
274 ["spin_id", "The spin ID string used by the generic intensity file format to restrict the loading of data to certain spin subsets."],
275 ["ncproc", "The Bruker specific FID intensity scaling factor."]
276 ]
277 read_intensities._doc_desc = """
278 The peak intensity can either be from peak heights or peak volumes.
279
280 The spectrum ID is a label which is subsequently utilised by other user functions. If this identifier matches that of a previously loaded set of intensities, then this indicates a replicated spectrum.
281
282 The heteronucleus and proton should be set respectively to the name of the heteronucleus and proton in the file. Only those lines which match these labels will be used.
283
284 The integration method is required for the subsequent error analysis. When peak heights are measured, this should be set to 'height'. Volume integration methods are a bit varied and hence two values are accepted. If the volume integration involves pure point summation, with no deconvolution algorithms or other methods affecting peak heights, then the value should be set to 'point sum'. All other volume integration methods, e.g. line shape fitting, the value should be set to 'other'.
285
286 If a series of intensities extracted from Bruker FID files processed in Topspin or XWinNMR are to be compared, the ncproc parameter may need to be supplied. This is because this FID is stored using integer representation and is scaled using ncproc to avoid numerical truncation artifacts. If two spectra have significantly different maximal intensities, then ncproc will be different for both. The intensity scaling is binary, i.e. 2**ncproc. Therefore if spectrum A has an ncproc of 6 and and spectrum B a value of 7, then a reference intensity in B will be double that of A. Internally, relax stores the intensities scaled by 2**ncproc.
287 """
288 read_intensities._doc_additional = [
289 ["File formats", """
290 The peak list or intensity file will be automatically determined.
291
292 Sparky peak list: The file should be a Sparky peak list saved after typing the command 'lt'. The default is to assume that columns 0, 1, 2, and 3 (1st, 2nd, 3rd, and 4th) contain the Sparky assignment, w1, w2, and peak intensity data respectively. The frequency data w1 and w2 are ignored while the peak intensity data can either be the peak height or volume displayed by changing the window options. If the peak intensity data is not within column 3, set the integration column to the appropriate number (column numbering starts from 0 rather than 1).
293
294 XEasy peak list: The file should be the saved XEasy text window output of the list peak entries command, 'tw' followed by 'le'. As the columns are fixed, the peak intensity column is hardwired to number 10 (the 11th column) which contains either the peak height or peak volume data. Because the columns are fixed, the integration column number will be ignored.
295
296 NMRView: The file should be a NMRView peak list. The default is to use column 16 (which contains peak heights) for peak intensities. To use use peak volumes (or evolumes), int_col must be set to 15.
297
298 Generic intensity file: This is a generic format which can be created by scripting to support non-supported peak lists. It should contain in the first few columns enough information to identify the spin. This can include columns for the molecule name, residue number, residue name, spin number, and spin name. Alternatively a spin ID string column can be used. The peak intensities can be placed in another column specified by the integration column number. Intensities from multiple spectra can be placed into different columns, and these can then be specified simultaneously by setting the integration column value to a list of columns. This list must be matched by setting the spectrum ID to a list of the same length. If columns are delimited by a character other than whitespace, this can be specified with the column separator. The spin ID can be used to restrict the loading to specific spin subsets.
299 """]
300 ]
301 read_intensities._doc_examples = """
302 To read the reference and saturated spectra peak heights from the Sparky formatted files
303 'ref.list' and 'sat.list', type:
304
305 relax> spectrum.read_intensities(file='ref.list', spectrum_id='ref')
306 relax> spectrum.read_intensities(file='sat.list', spectrum_id='sat')
307
308 To read the reference and saturated spectra peak heights from the XEasy formatted files
309 'ref.text' and 'sat.text', type:
310
311 relax> spectrum.read_intensities(file='ref.text', spectrum_id='ref')
312 relax> spectrum.read_intensities(file='sat.text', spectrum_id='sat')
313 """
314 _build_doc(read_intensities)
315
316
329
330
331 replicated._doc_title = "Specify which spectra are replicates of each other."
332 replicated._doc_title_short = "Replicate spectra."
333 replicated._doc_args = [
334 ["spectrum_ids", "The list of replicated spectra ID strings."]
335 ]
336 replicated._doc_desc = """
337 This is used to identify which of the loaded spectra are replicates of each other. Specifying the replicates is essential for error analysis if the baseplane RMSD has not been supplied.
338 """
339 replicated._doc_examples = """
340 To specify that the NOE spectra labelled 'ref1', 'ref2', and 'ref3' are the same spectrum
341 replicated, type one of:
342
343 relax> spectrum.replicated(['ref1', 'ref2', 'ref3'])
344 relax> spectrum.replicated(spectrum_ids=['ref1', 'ref2', 'ref3'])
345
346 To specify that the two R2 spectra 'ncyc2' and 'ncyc2b' are the same time point, type:
347
348 relax> spectrum.replicated(['ncyc2', 'ncyc2b'])
349 """
350 _build_doc(replicated)
351