Author: tlinnet Date: Wed Jan 7 20:25:09 2015 New Revision: 27155 URL: http://svn.gna.org/viewcvs/relax?rev=27155&view=rev Log: Implemented ordinary_least_squares function the repeated auto-analysis. Inspection of statistics books, shows that several authors does not recommend using regression through the origin (RTO).
From Joseph G. Eisenhauer: Regression through the Origin
- RTO residuals will usually have a nonzero mean, because forcing the regression line through the origin is generally inconsistent with the best fit. - R Square measures (for RTO) the proportion of the variability in the dependent variable "about the origin" explained by regression. This CANNOT be compared to R Square for models which include an intercept.
From "EXPERIMENTAL DESIGN AND DATA ANALYSIS FOR BIOLOGISTS", G. P. QUINN, MICHAEL J. KEOUGH
- minimum observed xi rarely extends to zero, and forcing our regression line through the origin not only involves extrapolating the regression line outside our data range but also assuming the relationship is linear outside this range (Cade & Terrell 1997, Neter et al. 1996). - We recommend that it is better to have a model that fits the observed data well than one that goes through the origin but provides a worse fit to the observed data. - residuals from the no-intercept model no longer sum to zero. - usual partition of SSTotal into SSRegression and SSResidual doesn’t work. Modified: trunk/auto_analyses/relax_disp_repeat_cpmg.py Modified: trunk/auto_analyses/relax_disp_repeat_cpmg.py URL: http://svn.gna.org/viewcvs/relax/trunk/auto_analyses/relax_disp_repeat_cpmg.py?rev=27155&r1=27154&r2=27155&view=diff ============================================================================== --- trunk/auto_analyses/relax_disp_repeat_cpmg.py (original) +++ trunk/auto_analyses/relax_disp_repeat_cpmg.py Wed Jan 7 20:25:09 2015 @@ -61,6 +61,45 @@ # Define sfrq key to dic. DIC_KEY_FORMAT = "%.8f" +# Define function for the Ordinary least squares of y=A + Bx +def ordinary_least_squares(x=None, y=None): + """Calculate the linear correlation 'B', the intercept 'A' for the function y=A + Bx. + + @keyword x: The data for the X-axis. + @type x: float or numpy array. + @keyword y: The data for the Y-axis. + @type y: float or numpy array. + @return: The intercept A, the standard deviation for A, the slope B, the standard deviation for B, standard deviation of the residuals, the linear correlation coefficient + @rtype: float, float, float, float, float, float + """ + + # Get the mean. + x_m = mean(x) + y_m = mean(y) + + # Get number of observations + N = len(y) + + # Calculate the denominator + denom = N * sum(x**2) - (sum(x))**2 + + # Solve by Ordinary linear least squares + A = ( sum(x**2) * sum(y) - sum(x) * sum(x*y) ) / denom + B = (N * sum(x*y) - sum(x) * sum(y) ) / denom + + # Calculate standard deviation of the residuals + std_y = sqrt( (1. / (N-2) ) * sum( (y - A - B*x)**2 ) ) + + # Calculate uncertainty of Constants + # This require than uncertainty in x is negligible + std_A = std_y * sqrt( sum(x**2) / denom ) + std_B = std_y * sqrt( N / denom ) + + # Linear correlation coefficient + r_xy = sum( (x - x_m)*(y - y_m) ) / sqrt( sum((x - x_m)**2) * sum((y - y_m)**2) ) + + return A, std_A, B, std_B, std_y, r_xy + class Relax_disp_rep: