mailRe: [sr #3154] Implementation of Baldwin (2014) B14 model - 2-site exact solution model for all time scales


Others Months | Index by Date | Thread Index
>>   [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Header


Content

Posted by Andrew Baldwin on May 05, 2014 - 12:36:
Hi Edward (and chaps!),

Just a quick one: the pints shall indeed be paid. Any more errors found in the paper = more pints to the finder. There's probably another week or so to go. Half pints for errors pointed out after the deadline.

Nikolai and I have actually already had a good discussion on this. We're both comfortable with the result I believe. The 2001 formula at double precision will shake a little sometimes (in hard to predict places) when exposed to relatively exotic parameters (see figure I sent a few emails ago). Note: the current supp figure on the web page is NOT the one that will be in the final paper. The final one is slightly different, showing in addition that it is relatively large deltaOmegas (experimentally rare) that are required to trip the code.

With regard to the error in the CR equation, simply check your own outputs against an exact result (eg Nikolai and Martin's), and you'll see that it can be improved by the change of definition. You're welcome to use the code I just sent you to see the error. This is also the same code base that Nikolai and I used for our earlier discussion, alluded to above. You are welcome to use my paper as a reference for the correct form should you wish (once the typos are expunged...).

If you want to preserve the integrity of the CR equation, then perhaps you should also change your definition of tau_cpmg to match theirs, as they state in their intro? I notice that you don't do this, so it seems you're already taking some (really utterly reasonable!) liberties with the original work. As a philosophical point though, I would have thought that users of your code would prefer to get a correct answer, over one that has historical meaning? Actually, Rex Richards is a fellow of my college here. He's really quite elderly now but I'll ask him what he thinks on this and get back to you!

Best,

Andy.


On 05/05/2014 11:06, Edward d'Auvergne wrote:
Hi Andy and Troels,

Please see below.  Note I have CCed Nikolai, Martin, and Flemming as
we are talking about their work and they may have an interest in the
discussions.  This is a public mailing list, so any messages sent are
permanently archived and can never be deleted from the web.  This
thread, including all messages, is archived at
http://thread.gmane.org/gmane.science.nmr.relax.devel/5410 (as well as
3 other places on the web).  The implementation of your analytic model
Andy is discussed in
http://thread.gmane.org/gmane.science.nmr.relax.devel/5414.


On 4 May 2014 22:59, Andrew Baldwin <andrew.baldwin@xxxxxxxxxxxxx> wrote:
Dear Troels,

The existence of typos is exceedingly probable! The editors have added more
errors in the transcrpition, so it'll be curious to see how accurate the
final draft will be. Your comments come just in time for alterations.

So I'll pay a pint of beer for every identified error. I didn't use that
particular supplementary section to directly calculate anything, so this is
the place in the manu where errors are most likely to occur. I do note the
irony that this is probably one of few bits in the article that people are
likely to read.

1) Yep - that's a typo. alpha- should definitely be KEG-KGE (score 1 pint).

2) Yep! zs should be zetas (score 1 pint).
As this is open for the world to see, and permanently archived, you'll
have to keep your word ;)  Hopefully the editor will still accept
these important corrections.


3) This one is more interesting. Carver and Richard's didn't define a
deltaR2.
This is not quite correct.  Thought they didn't have a deltaR2
parameter, they did use the notation 1/T2A - 1/T2B (equation A8 of the
original paper as Troels mentioned).  I have respected that notation
in relax:

http://www.nmr-relax.com/manual/full_CR72_2_site_CPMG_model.html

Also see section 10.3.3 in the PDF
(http://download.gna.org/relax/manual/relax.pdf).


To my knowledge, I think Nikolai was the first to do that in his
2002 paper on reconstructing invisible states.
For reference here, the paper is J. Am. Chem. Soc. 2002, 124,
12352-12360 (http://dx.doi.org/10.1021/ja0207089).  The notation
defined here is:

   Delta_R = RB - RA.

This is due to the separation of components in equation 2 - the RA
part, as well as site A chemical shift, shifting to the first
component and then being subtracted from the diagonal in the second.
Because of this equation 2, Nikolai's definition of Delta_R is
opposite from the Carver and Richards 1/T2A - 1/T2B component.  But if
site B was not "invisible" and you could see both exchanging peaks,
then you could have an equivalent of equation 2 where R2B and omega_B
are in the first component and then Delta_R' would be defined as RA -
RB.  You could then study the Delta_R and Delta_R' together.

Note that at no point in relax do we use Nikolai's Delta_R definition.
  In Nikolai and Martin's code - the Maple expansion - which is in
relax, the R20A == R20B assumption is made (see
http://www.nmr-relax.com/manual/NS_2_site_expanded_CPMG_model.html).
And in the other numeric models you will see that we never use a
difference:

http://www.nmr-relax.com/api/3.1/lib.dispersion.ns_matrices-module.html
http://www.nmr-relax.com/api/3.1/lib.dispersion.ns_matrices-pysrc.html#rcpmg_2d
http://www.nmr-relax.com/api/3.1/lib.dispersion.ns_matrices-pysrc.html#rcpmg_3d
http://www.nmr-relax.com/api/3.1/lib.dispersion.ns_matrices-pysrc.html#rr1rho_3d_3site
http://www.nmr-relax.com/api/3.1/lib.dispersion.ns_matrices-pysrc.html#rr1rho_3d


So here, I think you'll find
that my definition is the correct one. There is a typo in the Carver
Richard's equation.
Why is this typo in equation A8 not mentioned in the literature?
There should also be a published re-derivation of these equations
showing where this mistake occurred.  This should stand out like a
sore thumb!


I've just done a brief lit review, and as far as I can
see, this mistake has propagated into every transcription of the equation
since.
Well if there is a mistake in the original publication, that is to be
expected.  A re-derivation showing the mistake is also to be expected.


I should note that in the manu in this section. Likely this has gone
un-noticed as differences in R2 are difficult to measure by CPMG, instead
R2A=R2B being assumed. Though thanks to Flemming, it is now possible to
break this assumption:

J Am Chem Soc. 2009 Sep 9;131(35):12745-54. doi: 10.1021/ja903897e.
The last column of Table 2 of the Carver and Richards' paper shows
that they themselves have analysed synthetic cases where R20A != R20B.
  Note that in this paper from Flemming, that he uses the notation
R2,int(A) -  R2,int(B).  But the numerical solution in equations 4 and
5 do not use this difference (well partial solution as these are the
differentials).  Anyway, the original Carver and Richards model allows
the R20A == R20B assumption to be broken.


Also, in the Ishima and Torchia paper looking at the effects of relaxation
differences:

Journal of biomolecular NMR. 2006 34(4):209-19
In this paper, they optimise the original Carver and Richards model.
But they do not mention the typo in the original paper.  Here their
definition of Delta_R2 is actually the systematic error in the CPMG
experiment (see equation 1), not the R20 rate difference between the
two states.  Also if you look at the first paragraph in the section
labelled as "Data fitting using Carver–Richards equation", you can see
that they only optimised the parameters R20, delta_omega, pA and tex.
So as far as I was aware, they have used R20A == R20B throughout.
Hence this does not demonstrate a mistake in the Carver and Richards
equations.


I understand their derivations started from Nikolai's formula, which is
exact, so the error wouldn't come to light there either, necessarily.
I don't see how these publications support the fact that the original
Carver and Richards equation is wrong.  Maybe there are better
references?


To prove the point try this: numerically evaluate R2,effs over a big grid of
parameters, and compare say Nikolai's formula to your implementation of
Carver Richard's with a +deltaR2 and a -deltaR2 in the alpha- parameter.
Which agrees better? You'll see that dR2=R2E-R2G being positive wins.
This can be performed in relax by using the 'NS CPMG 2-site 3D full'
model to simulate the R2eff data, and the 'CR72 full' model with and
without the 'typo correction' to fit it.  Rather than an extensive
grid search, a single point where R20A and R20B are quite different
should be a sufficient demonstration.


The attached should also help. It's some test code I used for the paper that
has been briefly sanitised. In it, the numerical result, Carver Richards,
Meiboom, Nikolai's and my formula are directly compared for speed (roughly)
and precision.
Could you please attach files to https://gna.org/support/?3155 or the
other support requests (https://gna.org/support/?group=relax) rather
than sending it to public mailing lists.  Cheers!


The code added for Nikolai's formula at higher precision was contributed by
Nikolai after the two of us discussed some very odd results that can come
from his algorithm. Turns out evaluating his algorithm at double precision
(roughly 9 decimal places, default in python and matlab) is insuffficient.
Which optimisation algorithm did you use and what function tolerance,
X tolerance (minimum step size), and maximum number of iterations did
you use?  These settings are often dominant to the equations of the
base model.  Do you have a script or parameter set which demonstrates
the failure of Nikolai's Maple expansion?  I would like to get this
into relax to blast it with the Nelder-Mead simplex algorithm which
trounces Levenberg-Marquardt in accuracy and speed.  There is also 3D
space mapping functionality (the dx.map user function) which can be
used to directly show the change in the optimisation space due to
precision, etc.


About 23 decimal places is necessary for exactness. This is what is
discussed in the supplementary info in the recent paper. In short, on my
laptop:

formula / relative speed / biggest single point error in s-1 over grid
Mine                                         1       7.2E-10
Meiboom                                 0.3         9E7
Carver Richards                       0.53       8
Nikolai (double precision 9dp)  3.1        204
Nikolai (long double 18dp)       420       3E-2
Nikolai (23 dp)                         420       2E-7
Numerical                                118        0

The error in Carver Richard's with the alpha- sign made consistent with the
Carver Richard's paper gives an error of 4989 s-1 in the same test. Ie, it's
a bad idea.

Note that the precise calculations in python at arbitrary precision are slow
not because of the algorithm, but because of the mpmath module, which deals
with C code via strings. Accurate but slow. This code has a small amount of
additional overhead that shouldn't strictly be included in the time. I've no
doubt also that algorithms here could be sped up further by giving them a
bit of love. Those prefaces aside, I'm surprised that your benchmarking last
Friday showed Nikolai's to be so much slower than mine? In my hands
(attached) at double precision, Nikolai's formula is only 3x slower than
mine using numpy. Though this is not adequate precision for mistake free
calculations.
It's ok.  In relax, the models are implemented as the original authors
intended.  If there is a mistake in the original model, then that
mistake is replicated.  But, importantly, well documented.  The
subsequent paper which fixes the model will then be used to implement
a second model with the fix.  This is to allow relax users to test
historical results and to compare them to all other models and fixed
models.  It also means that history is not rewritten and that original
equations are permanently preserved.  Nikolai's code was implemented
as 64-bit - due to it being Matlab code - so that's what we will
preserve in relax.

For reference, Troels' post with the timings is at
http://thread.gmane.org/gmane.science.nmr.relax.devel/5410/focus=5515.
  The time for your model, Andy, was 6.001s whereas Nikolai's Maple
expansion (the "NS CPMG 2-site expanded" model in relax) was 8.735s.
I have made sure that Nikolai's code is as optimised as possible in
relax.  For example in the script you attached, you have a lot of
integers:

         t3 = mp.mpc(0,1)/2;
         t4 = t3*dw;
         t5 = Ka/2;
         t6 = Ra/2;
         t7 = Kb/2;
         t8 = Rb/2;
         t9 = -2*(mp.mpc(0,1));
         t14 = 2*(mp.mpc(0,1));
         t20 = 2*Kb*Ka;
         t22 = 2*Ka*Rb;
         t24 = 2*Ra*Kb;
         t26 = 2*Ra*Rb;
         t28 = 2*Kb*Rb;
         t30 = 2*Ka*Ra;

If you use 2.0 instead of 2, then Python type conversions are avoided,
speeding up the code.  As far as I'm aware, such a trick is not
required in Matlab.  There are many other small optimisations I have
made.  For your model labelled as "B14" in relax, these optimisation
are not present yet.  But the factor of 3x rather than current 1.5x
will probably be replicated once the code is optimised.


Perhaps more physically compelling, to get my equation to reduce to Carver
Richard's, alpha- has to be defined as I show it. Note that the origin of
this parameter is its role in the Eigenvalue of the 2x2 matrix. This is the
one advantage of a derivation - you can see where the parameters come from
and get a feel for what their meaning is.

So this typo, and the fact that Carver and Richards inadvertently change
their definition of tau_cpmg half way through the paper, are both errors in
the original manuscript. Though given that proof reading and type-setting in
1972 would have been very very painful, that there are this few errors is
still actually quite remarkable. I notice in your CR72 code, you are
currently using the incorrect definition, so I would suggest you change
that.
We should show this to be the case in relax's implementation before
changing the Carver and Richards formula.  Yes, I have used the
original definition.  But I have not encountered literature, apart
from your paper Andy (http://dx.doi.org/10.1007%2Fs10858-012-9694-6),
that says that it is incorrect.  Therefore I would really also like to
have a direct literature quote + derivation/demonstration of the
Carver and Richards typo before we change it.  And a internal
demonstration of inconsistency between the 'NS CPMG 2-site 3D full'
and 'CR72 full' models in relax.  Once a strong proof of the Carver
and Richards mistake is demonstrated, I would then insist that the
corrected model be called something different and that we preserve the
original equations as the 'CR72 full' model.  The Carver and Richards
model is so historically important for the dispersion field that we
must preserve it as is.

Cheers,

Edward




Related Messages


Powered by MHonArc, Updated Mon May 05 14:00:10 2014