How Do You Know if a Coefficient Is Statistically Significant
Linear Regression and Correlation
Testing the Significance of the Correlation Coefficient
OpenStaxCollege
[latexpage]
The correlation coefficient, r, tells us virtually the strength and management of the linear relationship between x and y. However, the reliability of the linear model also depends on how many observed data points are in the sample. Nosotros need to wait at both the value of the correlation coefficient r and the sample size north, together.
Nosotros perform a hypothesis test of the "significance of the correlation coefficient" to make up one's mind whether the linear human relationship in the sample data is strong enough to use to model the human relationship in the population.
The sample data are used to compute r, the correlation coefficient for the sample. If we had information for the unabridged population, nosotros could notice the population correlation coefficient. Just because we have merely take sample data, we cannot summate the population correlation coefficient. The sample correlation coefficient, r, is our gauge of the unknown population correlation coefficient.
- The symbol for the population correlation coefficient is ρ, the Greek letter "rho."
- ρ = population correlation coefficient (unknown)
- r = sample correlation coefficient (known; calculated from sample data)
The hypothesis exam lets us make up one's mind whether the value of the population correlation coefficient ρ is "shut to zero" or "significantly different from zero". We decide this based on the sample correlation coefficient r and the sample size n.
If the exam concludes that the correlation coefficient is significantly unlike from zero, nosotros say that the correlation coefficient is "significant."
Conclusion: At that place is sufficient evidence to conclude that there is a significant linear relationship betwixt x and y because the correlation coefficient is significantly different from zero.
What the decision ways: There is a significant linear human relationship between x and y. Nosotros can apply the regression line to model the linear human relationship between x and y in the population.
If the exam concludes that the correlation coefficient is not significantly different from zero (it is shut to zero), we say that correlation coefficient is "non significant".
Conclusion: "In that location is insufficient evidence to conclude that there is a significant linear relationship betwixt x and y because the correlation coefficient is not significantly unlike from zero."
What the conclusion means: There is non a significant linear relationship between x and y. Therefore, we CANNOT use the regression line to model a linear relationship between x and y in the population.
Note
- If r is meaning and the scatter plot shows a linear trend, the line tin can be used to predict the value of
y for values of 10 that are within the domain of observed ten values. - If r is not pregnant OR if the scatter plot does not show a linear trend, the line should not exist used for prediction.
- If r is significant and if the scatter plot shows a linear trend, the line may NOT exist advisable or reliable for prediction OUTSIDE the domain of observed x values in the data.
PERFORMING THE HYPOTHESIS TEST
- Zilch Hypothesis: H0 : ρ = 0
- Alternate Hypothesis: Ha : ρ ≠ 0
WHAT THE HYPOTHESES MEAN IN WORDS:
- Null Hypothesis H0 : The population correlation coefficient IS Non significantly unlike from nix. There IS Not a pregnant linear relationship(correlation) between x and y in the population.
- Alternate Hypothesis Ha : The population correlation coefficient IS significantly DIFFERENT FROM nada. There IS A SIGNIFICANT LINEAR Relationship (correlation) between x and y in the population.
Cartoon A Determination:At that place are two methods of making the conclusion. The two methods are equivalent and give the aforementioned result.
- Method ane: Using the p-value
- Method 2: Using a tabular array of disquisitional values
In this chapter of this textbook, we will always use a significance level of 5%, α = 0.05
Note
Using the p-value method, you could choose whatsoever appropriate significance level you desire; you are not limited to using α = 0.05. Simply the table of disquisitional values provided in this textbook assumes that nosotros are using a significance level of 5%, α = 0.05. (If nosotros wanted to utilize a different significance level than 5% with the critical value method, nosotros would need unlike tables of critical values that are not provided in this textbook.)
METHOD 1: Using a p-value to brand a decision
To calculate the p-value using LinRegTTEST:
On the LinRegTTEST input screen, on the line prompt for β or ρ, highlight "≠ 0"
The output screen shows the p-value on the line that reads "p =".
(Nigh figurer statistical software can calculate the p-value.)
If the p-value is less than the significance level (α = 0.05):
- Decision: Reject the null hypothesis.
- Conclusion: "There is sufficient evidence to conclude that in that location is a significant linear human relationship between x and y considering the correlation coefficient is significantly different from zero."
If the p-value is Non less than the significance level (α = 0.05)
- Decision: DO Non REJECT the nil hypothesis.
- Decision: "There is insufficient prove to conclude that there is a significant linear relationship between 10 and y considering the correlation coefficient is NOT significantly different from zero."
Adding Notes:
- You will use technology to summate the p-value. The following describes the calculations to compute the test statistics and the p-value:
- The p-value is calculated using a t-distribution with n – 2 degrees of liberty.
- The formula for the test statistic is \(t=\frac{r\sqrt{n-2}}{\sqrt{1-{r}^{2}}}\). The value of the exam statistic, t, is shown in the computer or calculator output along with the p-value. The test statistic t has the same sign as the correlation coefficient r.
- The p-value is the combined area in both tails.
An alternative way to summate the p-value (p) given by LinRegTTest is the command two*tcdf(abs(t),10^99, n-2) in 2d DISTR.
Third-Test vs FINAL-EXAM Example: p-value method
- Consider the tertiary exam/final exam example.
- The line of best fit is: ŷ = -173.51 + 4.83x with r = 0.6631 and there are n = eleven data points.
- Tin can the regression line be used for prediction? Given a third test score (x value), can nosotros
apply the line to predict the last examination score (predicted y value)?
H0 : ρ = 0
Ha : ρ ≠ 0
α = 0.05
- The p-value is 0.026 (from LinRegTTest on your calculator or from calculator software).
- The p-value, 0.026, is less than the significance level of α = 0.05.
- Determination: Reject the Null Hypothesis H0
- Conclusion: There is sufficient evidence to conclude that at that place is a pregnant linear relationship betwixt the third test score (ten) and the terminal examination score (y) because the correlation coefficient is significantly dissimilar from zero.
Because r is significant and the scatter plot shows a linear trend, the regression line can be used to predict terminal examination scores.
METHOD 2: Using a table of Critical Values to brand a decision
The 95% Critical Values of the Sample Correlation Coefficient Table can exist used to give you a practiced idea of whether the computed value of \(r\) is pregnant or not. Compare r to the advisable critical value in the table. If r is not between the positive and negative critical values, so the correlation coefficient is significant. If r is significant, and then y'all may desire to utilize the line for prediction.
Suppose yous computed r = 0.801 using n = 10 data points.df = north – 2 = 10 – 2 = 8. The critical values associated with df = 8 are -0.632 and + 0.632. If r < negative critical value or r > positive critical value, then r issignificant. Since r = 0.801 and 0.801 > 0.632, r is pregnant and the line may be usedfor prediction. If you view this example on a number line, it volition help you.
Effort It
For a given line of best fit, yous computed that r = 0.6501 using n = 12 information points and the critical value is 0.576. Can the line exist used for prediction? Why or why not?
If the besprinkle plot looks linear then, yes, the line can be used for prediction, because r > the positive disquisitional value.
Suppose y'all computed r = –0.624 with 14 data points. df = 14 – two = 12. The critical values are –0.532 and 0.532. Since –0.624 < –0.532, r is significant and the line can exist used for prediction
Attempt It
For a given line of best fit, y'all compute that r = 0.5204 using north = 9 data points, and the critical value is 0.666. Can the line be used for prediction? Why or why not?
No, the line cannot be used for prediction, because r < the positive disquisitional value.
Suppose you computed r = 0.776 and n = 6. df = 6 – 2 = four. The critical values are –0.811 and 0.811. Since –0.811 < 0.776 < 0.811, r is non significant, and the line should not exist used for prediction.
Endeavour It
For a given line of best fit, yous compute that r = –0.7204 using n = eight information points, and the critical value is = 0.707. Can the line be used for prediction? Why or why not?
Yes, the line can exist used for prediction, considering r < the negative disquisitional value.
THIRD-Exam vs FINAL-Test EXAMPLE: critical value method
Consider the third exam/final exam example.
The line of best fit is: ŷ = –173.51+4.83x with r = 0.6631 and in that location are n = 11 data points. Can the regression line be used for prediction? Given a tertiary-exam score (x value), tin nosotros use the line to predict the last exam score (predicted y value)?
- H0 : ρ = 0
- Ha : ρ ≠ 0
- α = 0.05
- Use the "95% Critical Value" table for r with df = north – 2 = 11 – 2 = 9.
- The critical values are –0.602 and +0.602
- Since 0.6631 > 0.602, r is pregnant.
- Determination: Reject the nada hypothesis.
- Conclusion:There is sufficient evidence to conclude that at that place is a meaning linear relationship between the third exam score (x) and the last exam score (y) because the correlation coefficient is significantly unlike from nothing.
Because r is significant and the scatter plot shows a linear trend, the regression line can be used to predict final exam scores.
Suppose you lot computed the following correlation coefficients. Using the tabular array at the end of the chapter, determine if r is significant and the line of best fit associated with each r can be used to predict a y value. If it helps, depict a number line.
- r = –0.567 and the sample size, n, is 19. The df = n – 2 = 17. The critical value is –0.456. –0.567 < –0.456 so r is significant.
- r = 0.708 and the sample size, n, is nine. The df = n – 2 = seven. The disquisitional value is 0.666. 0.708 > 0.666 so r is significant.
- r = 0.134 and the sample size, north, is 14. The df = 14 – 2 = 12. The critical value is 0.532. 0.134 is between –0.532 and 0.532 so r is not significant.
- r = 0 and the sample size, n, is five. No matter what the dfs are, r = 0 is betwixt the two critical values so r is not meaning.
Try It
For a given line of best fit, you compute that r = 0 using due north = 100 data points. Can the line exist used for prediction? Why or why not?
No, the line cannot be used for prediction no thing what the sample size is.
Assumptions in Testing the Significance of the Correlation Coefficient
Testing the significance of the correlation coefficient requires that certain assumptions nigh the data are satisfied. The premise of this test is that the information are a sample of observed points taken from a larger population. We have not examined the unabridged population because information technology is not possible or feasible to do then. We are examining the sample to draw a conclusion near whether the linear relationship that we run across betwixt x and y in the sample data provides strong plenty testify so that nosotros can conclude that there is a linear relationship between 10 and y in the population.
The regression line equation that we summate from the sample information gives the best-fit line for our detail sample. We want to use this best-fit line for the sample every bit an estimate of the best-fit line for the population. Examining the scatterplot and testing the significance of the correlation coefficient helps united states of america determine if it is appropriate to do this.
The assumptions underlying the test of significance are:
- At that place is a linear relationship in the population that models the boilerplate value of y for varying values of x. In other words, the expected value of y for each particular value lies on a straight line in the population. (We do non know the equation for the line for the population. Our regression line from the sample is our best estimate of this line in the population.)
- The y values for any particular x value are usually distributed about the line. This implies that there are more y values scattered closer to the line than are scattered further away. Supposition (1) implies that these normal distributions are centered on the line: the means of these normal distributions of y values prevarication on the line.
- The standard deviations of the population y values about the line are equal for each value of 10. In other words, each of these normal distributions of y values has the same shape and spread almost the line.
- The residual errors are mutually independent (no pattern).
- The data are produced from a well-designed, random sample or randomized experiment.
Affiliate Review
Linear regression is a procedure for fitting a straight line of the form ŷ = a + bx to data. The conditions for regression are:
- Linear In the population, there is a linear relationship that models the average value of y for different values of 10.
- Contained The residuals are assumed to be independent.
- Normal The y values are distributed normally for any value of ten.
- Equal variance The standard deviation of the y values is equal for each x value.
- Random The data are produced from a well-designed random sample or randomized experiment.
The gradient b and intercept a of the least-squares line estimate the slope β and intercept α of the population (truthful) regression line. To estimate the population standard deviation of y, σ, utilise the standard deviation of the residuals, s. \(s=\sqrt{\frac{SEE}{due north-two}}\). The variable ρ (rho) is the population correlation coefficient. To exam the null hypothesis
H 0: ρ = hypothesized value, use a linear regression t-test. The virtually common null hypothesis is H 0: ρ = 0 which indicates there is no linear relationship between x and y in the population. The TI-83, 83+, 84, 84+ calculator office LinRegTTest tin can perform this test (STATS TESTS LinRegTTest).
Formula Review
Least Squares Line or Line of All-time Fit:
\(\stackrel{^}{y}=a+bx\)
where
a = y-intercept
b = slope
Standard deviation of the residuals:
\(s=\sqrt{\frac{SEE}{n-2}}.\)
where
SSE = sum of squared errors
n = the number of information points
When testing the significance of the correlation coefficient, what is the null hypothesis?
When testing the significance of the correlation coefficient, what is the alternative hypothesis?
Ha : ρ ≠ 0
If the level of significance is 0.05 and the p-value is 0.04, what decision tin you draw?
Homework
If the level of significance is 0.05 and the p-value is 0.06, what conclusion can you describe?
Nosotros exercise not reject the zero hypothesis. In that location is non sufficient testify to conclude that at that place is a significant linear relationship betwixt ten and y because the correlation coefficient is non significantly different from zero.
If there are 15 data points in a set of data, what is the number of degree of liberty?
Source: http://pressbooks-dev.oer.hawaii.edu/introductorystatistics/chapter/testing-the-significance-of-the-correlation-coefficient/
0 Response to "How Do You Know if a Coefficient Is Statistically Significant"
Post a Comment