
Data defines the model by dint of genetic programming, producing the best decile table.


The Correlation Coefficient: Definition Bruce Ratner, Ph.D. 

The correlation coefficient, denoted by r, is a measure of the strength of the straightline or linear relationship between two variables. The correlation coefficient takes on values ranging between +1 and 1, including the end point values plus/minus 1 (*). The following points are the accepted guidelines for interpreting the correlation coefficient:
 0 indicates no linear relationship.
 +1 indicates a perfect positive linear relationship: as one variable increases in its values, the other variable also increases in its values via an exact linear rule.
 1 indicates a perfect negative linear relationship: as one variable increases in its values, the other variable decreases in its values via an exact linear rule.
 Values between 0 and 0.3 (0 and 0.3) indicate a weak positive (negative) linear relationship via a shaky linear rule.
 Values between 0.3 and 0.7 (0.3 and 0.7) indicate a moderate positive (negative) linear relationship via a fuzzyfirm linear rule.
 Values between 0.7 and 1.0 (0.7 and 1.0) indicate a strong positive (negative) linear relationship via a firm linear rule.
 The value of r squared is typically taken as “the percent of variation in one variable explained by the other variable,” or “the percent of variation shared between the two variables.”
 Linearity Assumption. The correlation coefficient requires that the underlying relationship between the two variables under consideration is linear. If the relationship is known to be linear, or the observed pattern between the two variables appears to be linear, then the correlation coefficient provides a reliable measure of the strength of the linear relationship. If the relationship is known to be nonlinear, or the observed pattern appears to be nonlinear, then the correlation coefficient is not useful, or at least questionable.

The calculation of the correlation coefficient for two variables, say X and Y, is simple to understand. Let zX and zY be the standardized versions of X and Y, respectively. That is, zX and zY are both reexpressed to have means equal to zero, and standard deviations (std) equal to one. The reexpressions used to obtain the standardized scores are in equations (3.1) and (3.2):
zXi = [Xi  mean(X)]/std(X) (3.1) zYi = [Yi  mean(Y)]/std(Y) (3.2)
The correlation coefficient is defined as the mean product of the paired standardized scores (zXi, zYi) as expressed in equation (3.3).
rX,Y = sum of [zXi * zYi]/(n1), where n is the sample size (3.3)

For more information about this article, call Bruce Ratner at 516.791.3544 or 1 800 DM STAT1; or email at br@dmstat1.com. 
Signup for a free GenIQ webcast: Click here. 

