# 1.6.10.E: Examining the Evidence using Graphs and Statistics (Exercises)

1. Fans of professional sports teams expect the owners of the team to spend the necessary money to get the players who will help them win a championship. The payrolls of various professional sports teams in the US were divided into thirds, and the number of championships won by teams in each third was compared. Make a complete bar graph of this data. (data from unpublished student statistics class project)
 Payroll Ranking Number of Championships Lowest Third 2 Middle Third 7 Highest Third 11
2. According to National Geographic, in 1903 there were 307 varieties of corn seed sold by seed houses. In 1983, there were 12 varieties sold be seed houses, the rest no longer being used. Find the sample proportion of varieties of corn seed that was still available in 1983 compared to 1903. Make a complete pie chart. (ngm.nationalgeographic.com/20...ariety-graphic viewed 9/9/13)

Do you think it is good or bad that there are fewer varieties? Why?
3. Between 2010 and 2014, two dams on the Elwha River in the Olympic National Park near Port Angeles, WA were removed, allowing salmon to spawn for the first time in that river in 100 years. Assume the weights of 10 Chinook salmon that returned were recorded. These weights are shown in the table below.
 41 48 40 43 45 39 35 47 41 51

The purpose of this problem is to find all the statistics using the formulas by hand. Show all work.

Find the mean, variance and standard deviation, and the 5 box plot numbers for this data.

4. Students in development mental math classes such as intermediate algebra are expected to know their math facts quickly. Automaticity, or math fact fluency, is the ability to recall math facts without having to make the calculations. The benefit of quickly knowing the math facts is that the working memory of the brain is not filled with the effort to make the calculations so that it can focus on the higher level thinking required for the algebra. Intermediate algebra students were given an automaticity test in which they had to solve as many one-step linear equations as possible in one minute. All addition, subtraction and multiplication equations used numbers between –10 and 10 while division equations had answers in that range. All answers were integers. The table below gives the number of problems completed successfully in one minute.
 11 30 28 31 23 27 29 9 38 18 19 17 26 12 10 19 20 17 15 23 22 34 23 36 10

Make a frequency distribution and histogram. Using a starting value of 5 and a class width of 5. Label the graph completely.

Make a box plot.

5. The objective of the automaticity experiments is to determine if there is a relationship between a student’s math fact fluency and their final grade for the quarter. The table below contains the bivariate data for 6 of the students.
 Automaticity Score Final Grade Student 1 19 4 Student 2 31 2.9 Student 3 16 1.4 Student 4 19 4 Student 5 20 2.3 Student 6 16 1.3

Make a scatter plot for this data.

Find the correlation using the formulas. You can use your calculator for the basic functions but not for simply finding correlation, other than to check your answer. Show all your work.

6. Automaticity is one area to investigate when a college attempts to improve success rates for developmental math classes. If it is a factor in success, then the college will develop a method for helping students improve their automaticity. An experiment was conducted in which intermediate algebra students were given a computerized automaticity test that required them to solve a mixture of one-step linear equations that required adding, subtracting or multiplying numbers between -10 and 10 and has solutions between those same values for the division problems. Examples include -3x = 12 and x + 5 = -3. A student’s score was the maximum number of problems they could answer correctly in one minute. Students had to get an answer correct before moving on to a new problem. One goal was to see if the average number of problems answered correctly in one minute was greater for students who passed the class than for those who didn’t pass.

The hypotheses that will be tested are:

(H_0: mu_ ext{pass} = mu_{fall})
(H_0: mu_ ext{pass} > mu_{fall})
(alpha = 0.05)

a. Complete the design layout table.

 Research Design Table Research Question: Type of Research Observational StudyObservational ExperimentManipulative Experiment What is the response varialbe? What is the parameter that will be calculated? Mean Proportion Correlation List potential latent variables. Grouping/explanatory Variable 1 (if present) Levels:

b. If two intermediate algebra classes were to be randomly selected from 12 classes being offered, with the classes being numbered 1 to 12, which two classes would be selected if the calculator was seeded with 27 or row 10 was used in the table of random digits?

c. What type of sampling method is used when a class is selected and everyone in the class participates in the research?

d. The data that will be gathered is the number of problems answered correctly in one minute. Are these data quantitative discrete, quantitative continuous or categorical?

The automaticity scores for the students who failed the class are shown in the table below.

 16 15 14 15 13 16 22 30 20 8 13 14 16 16 16 16 6 27 9

The automaticity scores for the students who passed the class are shown in the table below.

 20 19 33 15 20 14 9 11 12 17 8 20 31 38 29 9 22 31 31 30 9 22 31 31 30 15 10 10 23 22 7 11 23 17 19 20 20 18 9 27 25 15 18 9 27 25 15 23 28 11 20 13 36 34

e. Make a frequency distribution, double bar histogram and side-by-side box plots to show a graphical comparison of these two sets of scores.

f. Which graph is more effective in helping you see the difference between the data sets?

g. Find the mean, variance and standard deviation for both sets of data separately. You may use the statistical functions of your calculator.

h. The p-value of the statistical test that compares the two means is 0.0395. Write a concluding sentence in the style used in scholarly journals (like you were taught in Chapter 1).

i. Based on the results of this analysis and the decision rule in the story, will the college develop a program to help improve automaticity?

7. Why Statistical Reasoning Is Important for a Biology Student and Professional Developed in collaboration with Elysia Mbuja and Robert Thissen, Biology Department This topic is discussed in BIOL 160, General Biology.

To explore the scientific method, students will study the effect of alcohol on a Daphnia. Daphnia, living water fleas, are used because they are almost transparent and the beating heart can be seen. The theory to be tested is whether alcohol slows the heart rate of Daphnia. To conduct this test, a Daphnia will be placed in a drop of water on a microscope. The number of heartbeats in 15 seconds will be counted. The water will be removed from the slide and a drop of 8% alcohol will be placed on the Daphnia. After 1 minute, the heartbeats will be counted again. If the heartbeats are lower, it cannot be concluded that the reason is because of the alcohol. It could simply be the reaction to a drop of fluid being placed on the Daphnia or the effect of being on a slide under a light. Therefore, after the Daphnia is allowed to recover, it is returned to the slide following the exact same procedure except a drop of water is used instead of alcohol.

(H_0: mu_{ ext{alcohol}} = mu_{ ext{water}})
(H_1: mu_{ ext{alcohol}} < mu_{ ext{water}})
(alpha = 0.05)

a. Complete the experiment design table.

 Research Design Table Research Question: Type of Research Observational StudyObservational ExperimentManipulative Experiment What is the response varialbe? What is the parameter that will be calculated? Mean Proportion Correlation List potential latent variables. Grouping/explanatory Variable 1 (if present) Levels:

b. Make an appropriate graph to compare the two sets of data. The data in the shaded cells is authentic. It comes from a BIOL 160 class.

c. Show the relevant statistics for the two sets of data.
 Heart Rate after Alcohol Heart Rate after Water Mean Standard Deviation Median

d. The p-value from the t-test for 2 independent populations is 1.28E-5. Write a concluding sentence.

e. What is the effect of alcohol on the heart rate of a Daphnia? Do you think it will have the same effect on a human?

## Health benefits of physical activity: the evidence

The primary purpose of this narrative review was to evaluate the current literature and to provide further insight into the role physical inactivity plays in the development of chronic disease and premature death. We confirm that there is irrefutable evidence of the effectiveness of regular physical activity in the primary and secondary prevention of several chronic diseases (e.g., cardiovascular disease, diabetes, cancer, hypertension, obesity, depression and osteoporosis) and premature death. We also reveal that the current Health Canada physical activity guidelines are sufficient to elicit health benefits, especially in previously sedentary people. There appears to be a linear relation between physical activity and health status, such that a further increase in physical activity and fitness will lead to additional improvements in health status.

Physical inactivity is a modifiable risk factor for cardiovascular disease and a widening variety of other chronic diseases, including diabetes mellitus, cancer (colon and breast), obesity, hypertension, bone and joint diseases (osteoporosis and osteoarthritis), and depression. 1� The prevalence of physical inactivity (among 51% of adult Canadians) is higher than that of all other modifiable risk factors. 15 In this article we review the current evidence relating to physical activity in the primary and secondary prevention of premature death from any cause, cardiovascular disease, diabetes, some cancers and osteoporosis. We also discuss the evidence relating to physical fitness and musculoskeletal fitness and briefly describe the independent effects of frequency and intensity of physical activity. (A glossary of terms related to the topic appears in Appendix 1). In a companion paper, to be published in the Mar. 28 issue, we will review how to evaluate the health-related physical fitness and activity levels of patients and will provide exercise recommendations for health.

Several authors have attempted to summarize the evidence in systematic reviews and meta-analyses. These evaluations are often overlapping (reviewing the same evidence). Some of the most commonly cited cohorts have been described in different studies over time as more data accumulate (see Appendix 2, available online at www.cmaj.ca/cgi/content/full/174/6/801/DC1). In this review, we searched the literature using the key words “physical activity,” “health,” “health status,” 𠇏itness,” 𠇎xercise,” 𠇌hronic disease,” “mortality” and disease-specific terms (e.g., �rdiovascular disease,” �ncer,” 𠇍iabetes” and “osteoporosis”). Using our best judgment, we selected individual studies that were frequently included in systematic reviews, consensus statements and meta-analyses and considered them as examples of the best evidence available. We also have included important new findings regarding the relation between physical activity and fitness and all-cause and cardiovascular-related mortality.

## 1.6.10.E: Examining the Evidence using Graphs and Statistics (Exercises)

The concept of an interaction can be a difficult one for students new to the field of psychology research, yet interactions are an often-occurring and important aspect of behavioral science. The following lesson will introduce the concept of a statistical interaction, provide examples of interactions, and show you how to detect an interaction.

What is an interaction?

When two or more independent variables are involved in a research design, there is more to consider than simply the "main effect" of each of the independent variables (also termed "factors"). That is, the effect of one independent variable on the dependent variable of interest may not be the same at all levels of the other independent variable. Another way to put this is that the effect of one independent variable may depend on the level of the other independent variable.

In order to find an interaction, you must have a factorial design, in which the two (or more) independent variables are "crossed" with one another so that there are observations at every combination of levels of the two independent variables.

For example, if you were interested in the effects of practice and stress level on memory task performance, you might decide to employ a factorial design. You manipulate practice by having participants read a list of words either once or five times. You also manipulate stress level by having two conditions: in one (low stress), participants are told that the number of words that they recall is unimportant, and in the other (high stress), participants are told that most people can recall all words in the list, and that they are expected to be able to do so as well. Your dependent variable is the number of words recalled from the 30-word list.

In this design, you would need to have participants in each of the four cells of the design: low stress and one practice, low stress and five practices, high stress and one practice, and high stress and five practices. Let's say here that you had 25 participants in each of these four cells.

Now, if the two factors in the study (practice and stress) interact, this means that the effect of one factor depends on the level of the other factor. Let's insert some data to see if there is an interaction in this study.

The table above indicates the cell means, as well as the marginal means and the grand mean, for the study. For example, the mean number of words recalled under the low stress, one practice condition is 8. This is a cell mean. However, the mean number of words recalled under all low stress conditions (regardless of practice) is 16. This is a marginal mean.

So, do we have evidence of an interaction in this study? One way to answer this question is to begin by describing the main effects: if we need to qualify our statements about the main effects by saying "it depends," then we have evidence that there may be an interaction. It appears that there may be a main effect of stress. High stress conditions result in recall of fewer words than low stress conditions. It also appears that there is a main effect of practice: five practices results in better recall of the words than just one practice. However, the effect of the practice variable depends on the level of stress (and vice versa): under low stress conditions, practice seems to have a substantial positive effect (an average of 8 words recalled with one practice and 24 words recalled with five practices), but under high stress conditions, practice has only a small effect (4 versus 6 words under the two practice conditions, respectively).

Therefore, we have evidence of an interaction in this study. Of course, you will need to carry out the appropriate statistical test before you can conclude that your evidence is strong enough to support the claim that there is an interaction in the population. You may want to know if there are other ways to detect this interaction besides examining the cell means.

Using graphs to detect possible interactions

Visually inspecting the data using bar graphs or line graphs is another way of looking for evidence of an interaction. Each of the graphs below (Plots 1-8) depicts a different situation with regard to the main effects of the two independent variables and their interaction. You can visualize the main effects and interaction effects (if there are any) in both the line graphs as drawn and in the bar graphs, which are made visible by hovering over the "View as bar graph" button.

Depict the data from the sample problem (the effects of practice and stress on word recall) as both a bar graph and a line graph. Is there evidence of a possible interaction? How do you know?

## Testing the Significance of the Correlation Coefficient

The correlation coefficient, r, tells us about the strength and direction of the linear relationship between x and y. However, the reliability of the linear model also depends on how many observed data points are in the sample. We need to look at both the value of the correlation coefficient r and the sample size n, together.

We perform a hypothesis test of the “significance of the correlation coefficient” to decide whether the linear relationship in the sample data is strong enough to use to model the relationship in the population.

The sample data are used to compute r, the correlation coefficient for the sample. If we had data for the entire population, we could find the population correlation coefficient. But because we have only have sample data, we cannot calculate the population correlation coefficient. The sample correlation coefficient, r, is our estimate of the unknown population correlation coefficient.

• The symbol for the population correlation coefficient is ρ, the Greek letter “rho.”
• ρ = population correlation coefficient (unknown)
• r = sample correlation coefficient (known calculated from sample data)

The hypothesis test lets us decide whether the value of the population correlation coefficient ρ is “close to zero” or “significantly different from zero”. We decide this based on the sample correlation coefficient r and the sample size n.

If the test concludes that the correlation coefficient is significantly different from zero, we say that the correlation coefficient is “significant.”

• Conclusion: There is sufficient evidence to conclude that there is a significant linear relationship between x and y because the correlation coefficient is significantly different from zero.
• What the conclusion means: There is a significant linear relationship between x and y. We can use the regression line to model the linear relationship between x and y in the population.

If the test concludes that the correlation coefficient is not significantly different from zero (it is close to zero), we say that correlation coefficient is “not significant”.

• Conclusion: “There is insufficient evidence to conclude that there is a significant linear relationship between x and y because the correlation coefficient is not significantly different from zero.”
• What the conclusion means: There is not a significant linear relationship between x and y. Therefore, we CANNOT use the regression line to model a linear relationship between x and y in the population.
• If r is significant and the scatter plot shows a linear trend, the line can be used to predict the value of y for values of x that are within the domain of observed x values.
• If r is not significant OR if the scatter plot does not show a linear trend, the line should not be used for prediction.
• If r is significant and if the scatter plot shows a linear trend, the line may NOT be appropriate or reliable for prediction OUTSIDE the domain of observed x values in the data.

### PERFORMING THE HYPOTHESIS TEST

WHAT THE HYPOTHESES MEAN IN WORDS:

• Null Hypothesis H0: The population correlation coefficient IS NOT significantly different from zero. There IS NOT a significant linear relationship(correlation) between x and y in the population.
• Alternate Hypothesis Ha: The population correlation coefficient IS significantly DIFFERENT FROM zero. There IS A SIGNIFICANT LINEAR RELATIONSHIP (correlation) between x and y in the population.

DRAWING A CONCLUSION: There are two methods of making the decision. The two methods are equivalent and give the same result.

• Method 1: Using the p-value
• Method 2: Using a table of critical values

In this chapter of this textbook, we will always use a significance level of 5%, α = 0.05

Using the p-value method, you could choose any appropriate significance level you want you are not limited to using α = 0.05. But the table of critical values provided in this textbook assumes that we are using a significance level of 5%, α = 0.05. (If we wanted to use a different significance level than 5% with the critical value method, we would need different tables of critical values that are not provided in this textbook.)

#### METHOD 1: Using a p-value to make a decision

To calculate the p-value using LinRegTTEST:
On the LinRegTTEST input screen, on the line prompt for β or ρ, highlight “≠ 0
The output screen shows the p-value on the line that reads “p =”.
(Most computer statistical software can calculate the p-value.)

• Decision: Reject the null hypothesis.
• Conclusion: “There is sufficient evidence to conclude that there is a significant linear relationship between x and y because the correlation coefficient is significantly different from zero.”
• Decision: DO NOT REJECT the null hypothesis.
• Conclusion: “There is insufficient evidence to conclude that there is a significant linear relationship between x and y because the correlation coefficient is NOT significantly different from zero.”
• You will use technology to calculate the p-value. The following describes the calculations to compute the test statistics and the p-value:
• The p-value is calculated using a t-distribution with n – 2 degrees of freedom.
• The formula for the test statistic is . The value of the test statistic, t, is shown in the computer or calculator output along with the p-value. The test statistic t has the same sign as the correlation coefficient r.
• The p-value is the combined area in both tails.

An alternative way to calculate the p-value (p) given by LinRegTTest is the command 2*tcdf(abs(t),10^99, n-2) in 2nd DISTR.

• Consider the third exam/final exam example.
• The line of best fit is: ŷ = -173.51 + 4.83x with r = 0.6631 and there are n = 11 data points.
• Can the regression line be used for prediction? Given a third exam score (x value), can we use the line to predict the final exam score (predicted y value)?
• The p-value is 0.026 (from LinRegTTest on your calculator or from computer software).
• The p-value, 0.026, is less than the significance level of α = 0.05.
• Decision: Reject the Null Hypothesis H0
• Conclusion: There is sufficient evidence to conclude that there is a significant linear relationship between the third exam score (x) and the final exam score (y) because the correlation coefficient is significantly different from zero.

Because r is significant and the scatter plot shows a linear trend, the regression line can be used to predict final exam scores.

#### METHOD 2: Using a table of Critical Values to make a decision

The 95% Critical Values of the Sample Correlation Coefficient Table can be used to give you a good idea of whether the computed value of is significant or not. Compare r to the appropriate critical value in the table. If r is not between the positive and negative critical values, then the correlation coefficient is significant. If r is significant, then you may want to use the line for prediction.

Suppose you computed r = 0.801 using n = 10 data points.df = n – 2 = 10 – 2 = 8. The critical values associated with df = 8 are -0.632 and + 0.632. If r < negative critical value or r > positive critical value, then r issignificant. Since r = 0.801 and 0.801 > 0.632, r is significant and the line may be usedfor prediction. If you view this example on a number line, it will help you.

For a given line of best fit, you computed that r = 0.6501 using n = 12 data points and the critical value is 0.576. Can the line be used for prediction? Why or why not?

Suppose you computed r = –0.624 with 14 data points. df = 14 – 2 = 12. The critical values are –0.532 and 0.532. Since –0.624 < –0.532, r is significant and the line can be used for prediction

For a given line of best fit, you compute that r = 0.5204 using n = 9 data points, and the critical value is 0.666. Can the line be used for prediction? Why or why not?

Suppose you computed r = 0.776 and n = 6. df = 6 – 2 = 4. The critical values are –0.811 and 0.811. Since –0.811 < 0.776 < 0.811, r is not significant, and the line should not be used for prediction.

For a given line of best fit, you compute that r = –0.7204 using n = 8 data points, and the critical value is = 0.707. Can the line be used for prediction? Why or why not?

### THIRD-EXAM vs FINAL-EXAM EXAMPLE: critical value method

Consider the third exam/final exam example. The line of best fit is: ŷ = –173.51+4.83x with r = 0.6631 and there are n = 11 data points. Can the regression line be used for prediction? Given a third-exam score (x value), can we use the line to predict the final exam score (predicted y value)?

• H0: ρ = 0
• Ha: ρ ≠ 0
• α = 0.05
• Use the 󈭏% Critical Value” table for r with df = n – 2 = 11 – 2 = 9.
• The critical values are –0.602 and +0.602
• Since 0.6631 > 0.602, r is significant.
• Decision: Reject the null hypothesis.
• Conclusion:There is sufficient evidence to conclude that there is a significant linear relationship between the third exam score (x) and the final exam score (y) because the correlation coefficient is significantly different from zero.

Because r is significant and the scatter plot shows a linear trend, the regression line can be used to predict final exam scores.

Suppose you computed the following correlation coefficients. Using the table at the end of the chapter, determine if r is significant and the line of best fit associated with each r can be used to predict a y value. If it helps, draw a number line.

1. r = –0.567 and the sample size, n, is 19. The df = n – 2 = 17. The critical value is –0.456. –0.567 < –0.456 so r is significant.
2. r = 0.708 and the sample size, n, is nine. The df = n – 2 = 7. The critical value is 0.666. 0.708 > 0.666 so r is significant.
3. r = 0.134 and the sample size, n, is 14. The df = 14 – 2 = 12. The critical value is 0.532. 0.134 is between –0.532 and 0.532 so r is not significant.
4. r = 0 and the sample size, n, is five. No matter what the dfs are, r = 0 is between the two critical values so r is not significant.

For a given line of best fit, you compute that r = 0 using n = 100 data points. Can the line be used for prediction? Why or why not?

### Assumptions in Testing the Significance of the Correlation Coefficient

Testing the significance of the correlation coefficient requires that certain assumptions about the data are satisfied. The premise of this test is that the data are a sample of observed points taken from a larger population. We have not examined the entire population because it is not possible or feasible to do so. We are examining the sample to draw a conclusion about whether the linear relationship that we see between x and y in the sample data provides strong enough evidence so that we can conclude that there is a linear relationship between x and y in the population.

The regression line equation that we calculate from the sample data gives the best-fit line for our particular sample. We want to use this best-fit line for the sample as an estimate of the best-fit line for the population. Examining the scatterplot and testing the significance of the correlation coefficient helps us determine if it is appropriate to do this.

• There is a linear relationship in the population that models the average value of y for varying values of x. In other words, the expected value of y for each particular value lies on a straight line in the population. (We do not know the equation for the line for the population. Our regression line from the sample is our best estimate of this line in the population.)
• The y values for any particular x value are normally distributed about the line. This implies that there are more y values scattered closer to the line than are scattered farther away. Assumption (1) implies that these normal distributions are centered on the line: the means of these normal distributions of y values lie on the line.
• The standard deviations of the population y values about the line are equal for each value of x. In other words, each of these normal distributions of y values has the same shape and spread about the line.
• The residual errors are mutually independent (no pattern).
• The data are produced from a well-designed, random sample or randomized experiment.

### Chapter Review

Linear regression is a procedure for fitting a straight line of the form ŷ = a + bx to data. The conditions for regression are:

• Linear In the population, there is a linear relationship that models the average value of y for different values of x.
• Independent The residuals are assumed to be independent.
• Normal The y values are distributed normally for any value of x.
• Equal variance The standard deviation of the y values is equal for each x value.
• Random The data are produced from a well-designed random sample or randomized experiment.

The slope b and intercept a of the least-squares line estimate the slope β and intercept α of the population (true) regression line. To estimate the population standard deviation of y, σ, use the standard deviation of the residuals, s. . The variable ρ (rho) is the population correlation coefficient. To test the null hypothesis H0: ρ = hypothesized value, use a linear regression t-test. The most common null hypothesis is H0: ρ = 0 which indicates there is no linear relationship between x and y in the population. The TI-83, 83+, 84, 84+ calculator function LinRegTTest can perform this test (STATS TESTS LinRegTTest).

### Formula Review

Least Squares Line or Line of Best Fit:

Standard deviation of the residuals:

SSE = sum of squared errors

n = the number of data points

When testing the significance of the correlation coefficient, what is the null hypothesis?

When testing the significance of the correlation coefficient, what is the alternative hypothesis?

If the level of significance is 0.05 and the p-value is 0.04, what conclusion can you draw?

## RESULTS

### Descriptive statistics

Means and SD for doping attitudes and doping knowledge are presented in table 2. Of the 200 coaches who completed baseline measures, 85 coaches completed follow-up measures (see figure 1).

Means and SD for control and experimental arms at baseline and follow-up

### Primary outcome: changes in doping knowledge

A time × arm mixed ANOVA was conducted to examine the impact of the experimental condition on doping knowledge from baseline to follow-up (see figure 2). A significant interaction effect was found between time and condition (F1,83=147.59, p<0.001, ηp 2 =0.64). Dependent samples t-tests revealed a significant increase in doping knowledge in the experimental arm (t19=7.90, p<0.001) but not in the control arm (t 64=0.76, p=0.79).

Graph of the change in doping knowledge at baseline (time 1) and follow-up (time 2) for control and experimental arms.

### Secondary outcome: changes in doping attitudes

A time × arm mixed ANOVA was conducted to examine the impact of the experimental condition on doping attitudes from baseline to follow-up (see figure 3). A significant interaction effect was found between time and condition (F1,83=15.56, p<0.001, ηp 2 =0.16). Dependent samples t-tests revealed a significant decrease in favourable attitudes towards doping in the experimental arm (t19= −4.70, p<0.001) but not in the control arm (t64= −0.18, p=0.79). However, the experimental arm had significantly higher favourable attitudes towards doping attitudes at baseline than the control arm (t83= −4.46, p=0.02).

Graph of the change in doping attitudes at baseline (time 1) and follow-up (time 2) for control and experimental arms.

## 4 Discussion

### 4.1 Summary

The majority of the literature finds an inverse association of stress and PA behaviors. The current search uncovered 168 studies reported in the English language exploring these relationships in humans. This demonstrates a high level of interest in the topic for the last two decades, with an apparent acceleration in research production in the area. The literature provided ample support for an association between stress and PA (79.8 %), and of the studies identified, 72.8 % supported the hypotheses that higher stress is associated with lesser exercise and/or PA. Prospective studies with objective markers of stress, one indicator of study quality, nearly unanimously agreed (six of seven studies, 85.7 %) with this conclusion. Studies examining older adults (㹐 years), cohorts with men and women, and larger sample sizes (n > 100) as well as studies of higher quality (𢙗 on a 9-point scale) were more likely to show an inverse association. Other factors, such as whether a study’s subject pool comprised employees or a clinical population, did not clearly differentiate the literature finding inverse relationships between stress and PA and the literature finding a null association. Interestingly, 17.2 % of prospective studies found evidence that stress was predictive of greater PA and exercise behavior, and qualitative studies were particularly equivocal in regards to the valence of the association. While these findings cannot be labeled definitively as anomalies, it is clear that stress exerts a generally negative influence on PA.

The review of the literature found many life events and transitions that resulted in changed PA [3, 260, 262]. This specific area of inquiry has garnered substantial interest, with two review articles already published identifying specific life events that relate to perturbations in PA [322, 323]. One recent review determined that five life changes were associated with change in PA: employment status, residence, relationships, family structure, and physical status [322]. Marriage and remarriage are often, but not always, associated with declines in fitness while divorce is associated with gains in fitness, at least in men [266, 324]. Chronic disease diagnosis can be very stressful [325] and a vast literature connects the diagnosis of cancer [182, 243, 264, 315, 326-328] and HIV [329] with changes in PA. However, only a few studies gauge how mental stress associated with these conditions relates to changes in PA [182, 264], and none were able to objectively capture PA before a diagnosis. Another criticism of this approach is that many of the above events may be interpreted as being positive in nature. However, from a classic life stress perspective, any type of event or transition that causes dramatic changes to one’s life can result in concomitant changes in behavior and health [330]. Alternatively, being inundated with minor nuisances may also weaken one’s attempts for healthy behavior—perhaps to a similar degree as the experience of a small number of major life events [17, 189]. A familiar example includes holiday periods, when many people exercise less and eat more [331]. Given that most humans experience change frequently, clarification is needed to discern the specific conditions under which an event or series of events may perturb PA.

As might be expected, not all studies found an association between stress and PA. However, several studies suggest that the association may be indirect or masked by factors that moderate the relationship, such as exercise stage of change [17, 332, 333]. For instance, Lutz et al. [17] found that that women in the habit of exercising, in other words, at a higher stage of change, exercised more during times of stress. Conversely, infrequent exercisers were less active during periods of strain. This finding was supported by Seigel et al. [183], who found that young women who increased activity with stress were more avid exercisers. One’s stage of change for exercise, however, is not itself related to indicators of stress [243, 334]. Budden and Sagarin [210] found no association between exercise and occupational stress, but did find that stress related to perceived behavioral control for exercise, which in turn predicted exercise intention. Intention was predictive of actual exercise behavior. Payne et al. [333] found a similar pattern of results in a group of 286 British employees. Clearly, the influence of stress varies by individual attributes, which in some cases may obscure simple associations between stress and PA.

### 4.2 Clinical Implications

Stress interferes with the engagement of activity for the majority of people, which has important theoretical, practical, and clinical significance for professionals in the health and exercise fields. This is especially true given that the experience of stress (a) is widely prevalent (b) has repercussions for a wide range of health issues and (c) is reported as a growing problem in developed countries around the globe [18]. On the second assertion, it is well-known that a link exists between stress and the development of depression, cardiovascular disease, and many other health endpoints [50]. Convincing evidence is emerging that such links are moderated by PA [49, 53], with some data indicating that the connection is contingent on changes to this behavior [212]. With all of these facts in mind, health policies should include provisions for integrated prevention and treatment of chronic stress and its behavioral and medical sequela. Before this progress can materialize, however, the well-identified associations between stress and health-promoting behavior must be more recognized within the community of PA researchers, practitioners, and other advocates.

At this time, action must be taken to advance PA interventions by interweaving effective stress management techniques. Simply arousing knowledge of stress is not sufficient [335]. First, practitioners should measure objective and subjective measures of stress for each individual. This effort will help to identify those at risk for the effects of stress. Working with an interdisciplinary team, such as psychologists and therapists, will help to promote careful interpretation of these data and will provide the resources to more carefully attune to the client’s stressors and associated constrains, barriers, and needs [336]. Furthermore, practitioners should be mindful of stress vulnerability across stages of change and refine prescriptions accordingly to magnify adherence and to prevent relapse and dropout [184, 189]. For people contemplating a new exercise regimen, stress may interfere with attempts to initiate PA, and this may translate to an inability to reach healthful levels of exercise [184, 189]. On the other hand, those habituated to exercise exhibit resilience in the face of stress [17, 183]. In addition to exercise habits, it is worthwhile to identify individuals’ coping style. Some people use exercise to deal with stress (exercise approach) while others become distracted and succumb to the lure of less healthful behaviors (exercise avoidance). This emphasizes further that prescriptions should be tailored to the individual [60]. Stress differentially impacts various populations and interventions must be modified accordingly [232, 337]. As an example, Urizar et al. [89] suggests that specific coping strategies should be addressed for mothers based on family constraints, including social support, problem solving, reframing cognitions, and strategies to balance motherhood with the need to care for oneself. Relapse prevention counseling is an example of a technique that incorporates stress management [331, 338] and is a recommended intervention for stressed populations [184].

The content of these programs should be comprehensive. Identifying high-risk situations ahead of time is an important strategy [331, 339], and those who can predict stressors are typically better able to diminish losses potentially associated with them [340]. Teaching stressed individuals the importance of exercise as a method to emotionally cope, plus the problem-focused skills to cope with stress aside from exercise, is a dual priority [119, 341]. As exercise is a complex behavior for the newly active, requiring much planning, resources should be put in place to assist the stressed individual with the creation of primary and contingency plans. On this note, interventions that are more flexible and ‘user-friendly’ are necessary to help clients re-engage with stress-derailed PA regimens [154]. Much has been made of the stress-impulsivity connection and, consequently, a full complement of self-regulation strategies would likely be useful [129, 282, 342]. Simply continuing to exercise on a regular basis is a method to build self-control [88], and it is difficult to obviate well-established and reinforcing habits. Lastly, and perhaps most important, there is evidence that combining an exercise intervention with stress management can result in increased exercise during times of stress or prevent relapse [149, 279, 343]. Such practice has been successfully employed with alcohol and other drug treatments [344, 345]. Mindfulness-based stress reduction (MBSR) is a highly effective technique to promote stress reduction, and enhancing aspects of this program, such as mindful walking, may be an ideal avenue for intervention [346]. In summary, creating interventions to target stress and coping skills may help to facilitate greater PA and, ultimately, improved health outcomes.

### 4.3 Exercise as a Stressor

From a practical standpoint, exercise and the associated actions required to accomplish it may simply be burdens or minor stressors themselves. For many people, structured exercise is highly inconvenient (“one more thing to do” [189, 347]) during periods of greater strain [348]. As an example, women who work long hours feel unable to exercise due to many demands on their time, interference from family obligations, and other barriers [196]. Similarly, teenagers in the midst of household conflict find it difficult to plan for sports participation [171]. It has been noted that planning for exercise but then missing it due to stress-related circumstances may degrade exercise self-efficacy and add further frustration and dissatisfaction [159]. Langlie [349] found that during times of stress, individuals feel a lack of control and perceive maintaining health behaviors as costly. Consequently, for those who view exercise as a disruption, an inconvenience or another demand on their time, it is not a stretch to predict that exercise will decrease with stress. This may be particularly true when starting a new exercise routine [204, 347]. Indeed, Holmes and Rahe [330] suggest that any perturbation of one’s normal daily routine constitutes a stressor. Several studies have considered the potential social stress of PA participation [350-354]. For instance, inactive people are more sensitive to criticism of their bodyweight and fitness, more readily embarrassed, and may derive less affective pleasure and reinforcement from exercise [355], all of which may result in exercise avoidance, particularly when already in a state of mental stress. The perceived threats of comparison and competition, as well as the anticipation of an exhaustive effort may be much less tolerated under these conditions [122, 356]. All of these sources of additional stress should be considered in intervention design. Unfortunately, making one’s PA routine more convenient, such as exercising at home, does not necessarily mean that it will result in better adherence to exercise regimens. For instance, King and associates [184, 204] found that life events equally degraded adherence to a home-based or class-based exercise program.

The above discussion should impress upon the clinician and researcher that exercise is itself a mental [85, 356-361] and physical stressor [362-366]. In short, the stress of exercise may in some circumstances interact with psychological stress to dampen PA behavior. Indeed, exercise might by typified as a self-inflicted stressor, often intentionally undertaken with a goal of attaining health and fitness. While such experiences are generally considered adaptive, not all outcomes are positive in nature. From a physical standpoint, for instance, there is always risk of injury [309, 367], which is magnified under conditions of stress [368] and may result in missed exercise participation. Exercise undertaken in unaccustomed volumes can elevate glucocorticoids and stunt physical processes, such as neurogenesis [369]. Ultimately, at very high levels exercise may result in deleterious outcomes, such as unexplained underperformance syndrome. This outcome may be exacerbated by the experience of mental stressors and, likewise, may result in additional sensations of stress [370]. Indeed, increased exercise over a period of days or weeks can contribute to negative shifts in one’s mood [371] and increased perceived stress [372]. A recent study found that poor muscular recovery was associated with self reports of chronic stress [29]. As sensations related to muscle damage likely result in impaired PA [373], it is possible that stress may affect exercise behavior by magnifying unpleasant sensations associated with exercise.

### 4.4 Significance of Literature Finding a Positive Association Between Stress and PA

Findings that stress may elicit increases in PA behavior should not be considered happenstance and may explain studies with null findings [17]. Castro and associates [145] found that women who were anxious at baseline had better adherence to an exercise program over 12 months, and a similar result was found for colorectal cancer patients [264]. Johnson-Kozlow et al. [279] implemented an exercise intervention for a group of students in which stress management was a central feature. It should not be surprising then that with burgeoning stress men increased PA in this study. Health behaviors, such as exercise or recreational park use, may actually improve after a major life event, such as the death of a spouse with Alzheimer’s, simply because barriers for behavior are removed [374, 375]. Moreover, such observations are consistent with theories that predict changes in behavior in either direction with stress [183, 330, 376, 377]. For instance, resiliency researchers have long stressed that adversity may spur some individuals to higher levels of functioning [376, 377]. Seigel et al. [183] suggests a nomenclature for these disparate responses, referring to increased PA with stress as behavioral activation and weakened PA as behavioral inhibition, responses that appear to vary by traits of the individual. The rebound hypothesis of stress and PA proposed by Griffin et al. [191] posits that stress can result in a degraded PA response followed within days or weeks by a compensatory uptick in PA. Specifically, these researchers speculate that people may overdo healthy behaviors, such as exercise, to compensate for poor attention to health during the stressful period.

In the face of stress, one may elect to obviate feelings of displeasure by engaging in exercise, a form of emotion-focused coping [62, 168, 378]. Indeed, exercise may result in enhanced feelings of pleasure and is widely accepted as a tool for stress management [118, 201, 379-381]. Stetson et al. [189] found that 69 % of their sample of women exercised to relieve stress. Qualitative research indicates that individuals will use low to moderate intensity exercise (i.e., walking) as a method to regulate emotions [173, 293]. Interestingly, despite the expectation that PA will lessen displeasure, exercise enjoyment appears to be affected during weeks of stress [189]. Nevertheless, people who believe that exercise is a useful method for stress reduction are more likely to engage in a moderate or greater level of exercise [225, 318]. Those who exercise to cope with stress report higher exercise behavior than those who do not cope by exercising [188]. Stress management as a motive for exercise has been found for several populations [178, 382-386]. However, a large sample of highly active fitness enthusiasts reported that stress management ranked far below other sources of motivation, such as exercise enjoyment [387].

These issues decry the general lack of understanding of the relationship between coping with stress and PA. Exercise behavior declines on days when individuals use more emotion-focused coping [201], but in general the use of positive coping behaviors is related to greater PA [250, 300]. The general coping style of the individual may account for these differences, as people with rigid coping styles tend to increase PA behavior with increased stress [280], although this finding is challenged by other data [150]. Moos and Schaefer [388] state that 𠇊mong self-efficacious individuals, engaging in PA can be described as a task-oriented way of dealing with stressful events using a behavioral-approach coping style. Alternatively, engaging in PA may be used to avoid life stressors among less self-efficacious individuals.” This suggests that exercise may serve to both deal with and steer away from stress, and the strategy utilized may vary by one’s self-efficacy for exercise. This may be particularly salient for those who are exercise dependent [389, 390] and for those who compensate for stress-induced overeating by exercising [183, 391-393]. These phenomena add an extra layer of complexity to any analysis of stress and exercise and may account for weak relationships observed by many studies.

### 4.5 Limitations of the Literature: Methodological Considerations

Several limitations in the stress literature have been discerned by this review, particularly as identified by the quality assessment rating (Electronic Supplementary Material, Appendix 1). The most obvious is the limited amount of experimental evidence. The use of control groups should be utilized, as changes in PA are frequently due to other factors, such as a change in seasons [331, 394]. Examination and holiday stressors coincide with more averse weather in many latitudes, which is perhaps the greatest limitation in this area of research. Cross-sectional studies cannot provide indication of the direction of influence. Does stress impact exercise directly, or do inactive individuals self-select more stressful environments [170]? Such a possibility implies that other factors may be responsible for the association. Nevertheless, more than 50 studies in this review utilized a prospective design, which allays some concern.

Apart from issues of design, there are also issues with measurement. First, stress may impact the recall of exercise behavior as opposed to exercise behavior itself, with activity being over- or understated [395]. Objective measures of PA, therefore, are greatly needed, and only a few cross-sectional studies have employed such markers [249, 258]. Furthermore, most subjective measures do not capture the full complexity of the behavior, including occupational and commuting activity [308]. To illustrate this point, Fredman et al. [254] found that caregivers have greater self-reported total PA than non-caregivers but lower leisure time PA. Moreover, many papers do not inquire about exercise intensity, although it is equivocal as to whether intensity is impacted to the same degree as frequency or duration [17, 229, 241, 251]. It is possible that an individual may shift intensity as the priority for fitness, typically achieved with greater exercise effort, gives way to a greater emphasis on stress management [173]. When athletes are specifically asked what mental factors prevent them from giving 100 % effort in practice, they typically list life events, school demands, and other stressors [396]. Lastly, it is unfortunate that nearly 50 % of prospective studies did not utilize pre-tested PA/exercise measures, with some relying on simple dichotomous measures of exercise behavior [210, 219, 263].

The measurement of stress appears to play an important role in the stress𠄾xercise literature. Measures of stress varied greatly in the studies reviewed, which parallels the multiplicity of stress definitions employed. Studies in this analysis were divided nearly evenly on whether they focused on subjective (i.e., perceived) or objective (e.g., life events, daily hassles) measures of stress, and several studies have also specifically focused on chronically stressed populations [173, 186, 190, 196, 251]. Studies employing measures of life stress sometimes include both positive and negative life events with no differentiation [280], whereas others have focused exclusively on negative experiences [184]. Any challenging experience will tax the human organism at varying degrees, but many studies have favored a summation of life events without considering the weighted impact or magnitude of each individual event [25, 184]. Exercise has been observed to serve as coping during transient stressors [168, 397, 398] and even when experiencing a major life event [175, 184]. Other dimensions of the stress process may also be salient, such as the predictability of the event or an individual’s perceived ability to cope with the stressor [36]. One must also consider the type (e.g., social, financial) and controllability of stress, all of which may influence whether exercise is utilized as a coping device. On days when stress is perceived as controllable, exercise increases [201]. Animal models demonstrate that different types of stressors (i.e., social defeat vs. open field stress) result in either habituation or non-habituation of PA [399]. Indeed, social stress resulted in a significant decline in PA amongst children in the only experimental study to date [193]. Lastly, it is important to note that no research specifically focused on cumulative adversity, a construct associated with many health behaviors [125, 128].

A tertiary area of concerns lies in temporal aspects of stress research. From a measurement perspective, assessments of stress and PA are often mismatched, with one measure inquiring about stress over a given period (e.g., the last month Perceived Stress Scale [PSS]) and the other inquiring about PA over a different period of time (e.g., the last year, Modifiable Activity Questionnaire [MAQ]) [124, 191, 259, 286, 295]. Prospective studies, while an improvement over cross-sectional ones, do not always gauge stress and PA at each time point [268, 279]. This is important to determine bi-directional associations of stress and PA. Diary studies have provided considerable improvement in this respect, while also being less affected by stress-related memory deficits [17, 189]. Most research has failed to look at relationships in both a concurrent/contemporaneous and time-lagged manner [245]. While it is possible that stress has a weak relationship with PA at any given point of time, a much stronger relationship likely exists between stress and (a) PA at a future time, (b) PA change scores [17, 124, 189], and/or (c) more qualitative measures including exercise adoption, maintenance [184, 199] and intervention adherence. The Physical Activity Maintenance (PAM) model [199] argues that stress most relates to relapse, and a plethora of evidence looking at other health behaviors would support this notion [129]. A cross-lagged analysis would help to determine which direction of influence is stronger between stress and PA, but only one report has undertaken such an analysis [20].

Sample characteristics are germane to the study of stress. It is frequently difficult to recruit truly stressed subjects for research studies, which results in a response or selection bias [400]. Consequently, a constrained range or low level of stress scores (i.e., not enough variability in stress) may obscure any true effect [191, 275, 303]. Those who drop out of studies tend to have higher stress and anxiety, which could also mask any potential effects [188]. Several studies finding an inverse trend of a stress–PA association have been underpowered [277], while others are overpowered, detecting trivial associations [260, 268, 270, 272, 316]. Studies with large samples of inactive participants (or conversely all active subjects) may not have enough variability in exercise measures to detect an effect [273].

Finally, it should be noted that this review has limitations. Only three databases were searched. Moreover, the search in PubMed was truncated and did not extend before the year 2000. However, these are not likely substantive issues considering (a) the numerous studies discovered (b) the retrieval of few unique investigations in successive database searches and (c) the linear distribution of papers across time ( Fig. 2 ). Additionally, this is the first review of its kind therefore, this analysis adds considerable insight into an area that has produced a large quantity of data. Despite this abundance, the current body of work has not been featured well in reviews summarizing psychosocial influences on PA, necessitating the current report [153-160].

### 4.6 Future Directions

Possibilities abound for future research in this area. Currently, evidence demonstrating the efficacy of an exercise–stress management intervention is scant. Nevertheless, initial reports are promising [192]. Interventions could be optimized if stress–PA relationships could be titrated. For instance, Oman and King [184] discerned that an increase in major life events, specifically from three to four, did not result in a proportional decline in exercise adherence. This type of research represents an important area of future inquiry and could be coordinated to additionally identify the factors that potentially protect one from, or make one vulnerable to, the effects of stress. Risk factors might include race/ethnicity, family background or individual characteristics, such as lifetime adversity and disadvantaged experiences [34, 35]. These latter two constructs are also indicators of stress, which serve as a reminder that stress instrumentation could be enhanced in future research by incorporating a lifespan perspective. Triangulating self-report measures with participant interviews and corroborating evidence from persons close to study participants would provide a strong advancement to stress measurement [401].

Apart from one experiment [193], there has been a lack of studies manipulating stress to assess the effect of such experiences on PA behaviors. It must be noted, however, that experimental exposure to stress is difficult, if not unethical, to implement. Measuring PA opportunistically during periods of objectively rated low and high stress, such as final examinations or other naturalistic stressors, provides stronger evidence [185, 187, 192]. The model demonstrated by Stults-Kolehmainen and Bartholomew [29], in which populations are screened for both very low and very high levels of chronic perceived stress, is an example of a quasi-experimental design that could be employed. Ecological Momentary Assessment (EMA) is one technique to measure stress and PA in real time, resulting in less vulnerability to stress-related failures in the recall of behavior and emotion [154, 265, 402]. Prospective studies should sample more frequently to minimize the effects of stress on memory and cognition, factors that in themselves may moderate the stress and exercise relationship [403].

These investigations may help to describe shifts in the relationship as individuals progress from sedentary behavior to exercise adoption, maintenance, and periods of relapse. The area of exercise habituation seems very promising [17, 183], as it is likely that novice exercisers are more susceptible to the effects of impulses, lack self-control, and are not resilient to the physical, emotional, and social stressors of exercise itself [351]. Furthermore, as individuals habituate to exercise there are likely concomitant changes in fitness, a potential moderator with minimal emphasis thus far [229]. Other moderators may be genetic (i.e., polymorphisms in genes regulating energy expenditure), physiological (e.g., adrenal sensitivity, muscle activation), health-related (e.g., illness, symptoms), personality-related (e.g., conscientiousness, neuroticism, perfectionism, type B, sensation-seeking [141, 142, 269, 404-407]), social/environmental [232], and related to coping style, though few studies have measured the extent to which individuals use exercise to cope with stress. Researchers may look to the nutrition literature as a similar bifurcation occurs when individuals are exposed to stressors: either more consumption or less or even fasting [168, 408]. This work has revealed mechanisms underlying the stress and caloric intake relationship, such as cortisol reactivity [134, 409-411]. Experimental models in this area are more sophisticated, which points to a need in the current literature reviewed. Hopefully this progress will help to determine the individual factors that may hasten declines in health-promoting behavior when stressed or, in a few cases, spur more activity.

The above discussion underscores the central need for additional models and a theoretical framework that describe the non-linear, bi-directional and dynamic nature of stress and PA relationships [20, 290]. At this time, theoretical models of stress and behavior are largely lacking or are specialized to particular contexts (e.g., worksites, urban life) [170, 200]. Links between stress, coping style, perceptions of energy and fatigue, energy expenditure (including spontaneous PA and non-exercise activity thermogenesis [NEAT]) and metabolism, amongst other factors (e.g., conscientiousness) should be integrated into conceptual models explaining obesity and physical health. Models specifically examining recovery from stressors [29, 170, 282] and sedentary behavior [170, 173, 193, 195, 209] would be useful, as stress is linked to these outcomes. Finally, it should be noted that psychosocial stress and exercise interact during PA itself, a third area of inquiry that will likely inform the complex confounding of these two factors [350, 412, 413].

## Examining the 'CSI-Effect' in the Cases of Circumstantial Evidence and Eyewitness Testimony: Multivariate and Path Analyses

As part of a larger investigation of the changing nature of juror behavior in the context of technology development, this study examined important questions unanswered by previous studies on the “CSI-effect.” In answering such questions, the present study applied multivariate and path analyses for the first time. The results showed that (a) watching CSI dramas had no independent effect on jurors' verdicts, (b) the exposure to CSI dramas did not interact with individual characteristics, (c) different individual characteristics were significantly associated with different types of evidence, and (d) CSI watching had no direct effect on jurors' decisions, and it had an indirect effect on conviction in the case of circumstantial evidence only as it raised expectations about scientific evidence, but it produced no indirect effect in the case of eyewitness testimony only. Finally, implications of the present study as well as for future research on the “CSI-effect” on jurors are discussed.

Keywords: CSI, forensic evidence, jury

Suggested Citation: Suggested Citation

### Young S. Kim

#### Eastern Michigan University ( email )

Eastern Michigan University
Ypsilanti, MI 48197
United States

## Using Tableau to visualize data and drive decision-making ☆

This case emphasizes the importance of data analysis through the usage of data visualization software to help you gain an understanding of data and how it can be transformed into information that can enhance the decision-making process. In the data visualization software, Tableau, you will be asked to connect to an Access data file to analyze six months of sales transaction data of a small start-up ice cream manufacturer. Consistent with AACSB Accounting Standard A7, the case focuses on familiarization with data visualization software to “convey data, results, and insights” (AACSB, 2013) and apply higher-order thinking. Upon familiarization with the data and data visualization software, you will be required to perform an exploratory analysis to identify key trends in the data to prepare and report that information to enhance the business decision-making process. This case is intended to be utilized in an undergraduate accounting information systems course an introductory managerial course or a course focusing on data analytics as a basic introduction to data visualization software.

## 12.4 Testing the Significance of the Correlation Coefficient

The correlation coefficient, r, tells us about the strength and direction of the linear relationship between x and y. However, the reliability of the linear model also depends on how many observed data points are in the sample. We need to look at both the value of the correlation coefficient r and the sample size n, together.

We perform a hypothesis test of the "significance of the correlation coefficient" to decide whether the linear relationship in the sample data is strong enough to use to model the relationship in the population.

The sample data are used to compute r, the correlation coefficient for the sample. If we had data for the entire population, we could find the population correlation coefficient. But because we have only sample data, we cannot calculate the population correlation coefficient. The sample correlation coefficient, r, is our estimate of the unknown population correlation coefficient.

• The symbol for the population correlation coefficient is ρ, the Greek letter "rho."
• ρ = population correlation coefficient (unknown)
• r = sample correlation coefficient (known calculated from sample data)

The hypothesis test lets us decide whether the value of the population correlation coefficient ρ is "close to zero" or "significantly different from zero". We decide this based on the sample correlation coefficient r and the sample size n.

If the test concludes that the correlation coefficient is significantly different from zero, we say that the correlation coefficient is "significant."

• Conclusion: There is sufficient evidence to conclude that there is a significant linear relationship between x and y because the correlation coefficient is significantly different from zero.
• What the conclusion means: There is a significant linear relationship between x and y. We can use the regression line to model the linear relationship between x and y in the population.

If the test concludes that the correlation coefficient is not significantly different from zero (it is close to zero), we say that correlation coefficient is "not significant".

• Conclusion: "There is insufficient evidence to conclude that there is a significant linear relationship between x and y because the correlation coefficient is not significantly different from zero."
• What the conclusion means: There is not a significant linear relationship between x and y. Therefore, we CANNOT use the regression line to model a linear relationship between x and y in the population.
• If r is significant and the scatter plot shows a linear trend, the line can be used to predict the value of y for values of x that are within the domain of observed x values.
• If r is not significant OR if the scatter plot does not show a linear trend, the line should not be used for prediction.
• If r is significant and if the scatter plot shows a linear trend, the line may NOT be appropriate or reliable for prediction OUTSIDE the domain of observed x values in the data.

### PERFORMING THE HYPOTHESIS TEST

WHAT THE HYPOTHESES MEAN IN WORDS:

• Null Hypothesis H0: The population correlation coefficient IS NOT significantly different from zero. There IS NOT a significant linear relationship (correlation) between x and y in the population.
• Alternate Hypothesis Ha: The population correlation coefficient IS significantly DIFFERENT FROM zero. There IS A SIGNIFICANT LINEAR RELATIONSHIP (correlation) between x and y in the population.

DRAWING A CONCLUSION: There are two methods of making the decision. The two methods are equivalent and give the same result.

• Method 1: Using the p-value
• Method 2: Using a table of critical values

In this chapter of this textbook, we will always use a significance level of 5%, α = 0.05

Using the p-value method, you could choose any appropriate significance level you want you are not limited to using α = 0.05. But the table of critical values provided in this textbook assumes that we are using a significance level of 5%, α = 0.05. (If we wanted to use a different significance level than 5% with the critical value method, we would need different tables of critical values that are not provided in this textbook.)

### Using the TI-83, 83+, 84, 84+ Calculator

To calculate the p-value using LinRegTTEST:
On the LinRegTTEST input screen, on the line prompt for β or ρ, highlight "≠ 0"
The output screen shows the p-value on the line that reads "p newline">
(Most computer statistical software can calculate the p-value.)

• Decision: Reject the null hypothesis.
• Conclusion: "There is sufficient evidence to conclude that there is a significant linear relationship between x and y because the correlation coefficient is significantly different from zero."
• Decision: DO NOT REJECT the null hypothesis.
• Conclusion: "There is insufficient evidence to conclude that there is a significant linear relationship between x and y because the correlation coefficient is NOT significantly different from zero."
• You will use technology to calculate the p-value. The following describes the calculations to compute the test statistics and the p-value:
• The p-value is calculated using a t-distribution with n - 2 degrees of freedom.
• The formula for the test statistic is t = r n − 2 1 − r 2 t = r n − 2 1 − r 2 . The value of the test statistic, t, is shown in the computer or calculator output along with the p-value. The test statistic t has the same sign as the correlation coefficient r.
• The p-value is the combined area in both tails.

An alternative way to calculate the p-value (p) given by LinRegTTest is the command 2*tcdf(abs(t),10^99, n-2) in 2nd DISTR.

• Consider the third exam/final exam example.
• The line of best fit is: ŷ = -173.51 + 4.83x with r = 0.6631 and there are n = 11 data points.
• Can the regression line be used for prediction? Given a third exam score (x value), can we use the line to predict the final exam score (predicted y value)?
• The p-value is 0.026 (from LinRegTTest on your calculator or from computer software).
• The p-value, 0.026, is less than the significance level of α = 0.05.
• Decision: Reject the Null Hypothesis H0
• Conclusion: There is sufficient evidence to conclude that there is a significant linear relationship between the third exam score (x) and the final exam score (y) because the correlation coefficient is significantly different from zero.

Because r is significant and the scatter plot shows a linear trend, the regression line can be used to predict final exam scores.

### Example 12.7

Suppose you computed r = 0.801 using n = 10 data points. df = n - 2 = 10 - 2 = 8. The critical values associated with df = 8 are -0.632 and + 0.632. If r < negative critical value or r > positive critical value, then r is significant. Since r = 0.801 and 0.801 > 0.632, r is significant and the line may be used for prediction. If you view this example on a number line, it will help you.

### Try It 12.7

For a given line of best fit, you computed that r = 0.6501 using n = 12 data points and the critical value is 0.576. Can the line be used for prediction? Why or why not?

### Example 12.8

Suppose you computed r = –0.624 with 14 data points. df = 14 – 2 = 12. The critical values are –0.532 and 0.532. Since –0.624 < –0.532, r is significant and the line can be used for prediction

### Try It 12.8

For a given line of best fit, you compute that r = 0.5204 using n = 9 data points, and the critical value is 0.666. Can the line be used for prediction? Why or why not?

### Example 12.9

Suppose you computed r = 0.776 and n = 6. df = 6 – 2 = 4. The critical values are –0.811 and 0.811. Since –0.811 < 0.776 < 0.811, r is not significant, and the line should not be used for prediction.

### Try It 12.9

For a given line of best fit, you compute that r = –0.7204 using n = 8 data points, and the critical value is = 0.707. Can the line be used for prediction? Why or why not?

### THIRD-EXAM vs FINAL-EXAM EXAMPLE: critical value method

Consider the third exam/final exam example. The line of best fit is: ŷ = –173.51+4.83x with r = 0.6631 and there are n = 11 data points. Can the regression line be used for prediction? Given a third-exam score (x value), can we use the line to predict the final exam score (predicted y value)?

• H0: ρ = 0
• Ha: ρ ≠ 0
• α = 0.05
• Use the "95% Critical Value" table for r with df = n – 2 = 11 – 2 = 9.
• The critical values are –0.602 and +0.602
• Since 0.6631 > 0.602, r is significant.
• Decision: Reject the null hypothesis.
• Conclusion:There is sufficient evidence to conclude that there is a significant linear relationship between the third exam score (x) and the final exam score (y) because the correlation coefficient is significantly different from zero.

Because r is significant and the scatter plot shows a linear trend, the regression line can be used to predict final exam scores.

### Example 12.10

Suppose you computed the following correlation coefficients. Using the table at the end of the chapter, determine if r is significant and the line of best fit associated with each r can be used to predict a y value. If it helps, draw a number line.

1. r = –0.567 and the sample size, n, is 19. The df = n – 2 = 17. The critical value is –0.456. –0.567 < –0.456 so r is significant.
2. r = 0.708 and the sample size, n, is nine. The df = n – 2 = 7. The critical value is 0.666. 0.708 > 0.666 so r is significant.
3. r = 0.134 and the sample size, n, is 14. The df = 14 – 2 = 12. The critical value is 0.532. 0.134 is between –0.532 and 0.532 so r is not significant.
4. r = 0 and the sample size, n, is five. No matter what the dfs are, r = 0 is between the two critical values so r is not significant.

For a given line of best fit, you compute that r = 0 using n = 100 data points. Can the line be used for prediction? Why or why not?

### Assumptions in Testing the Significance of the Correlation Coefficient

Testing the significance of the correlation coefficient requires that certain assumptions about the data are satisfied. The premise of this test is that the data are a sample of observed points taken from a larger population. We have not examined the entire population because it is not possible or feasible to do so. We are examining the sample to draw a conclusion about whether the linear relationship that we see between x and y in the sample data provides strong enough evidence so that we can conclude that there is a linear relationship between x and y in the population.

The regression line equation that we calculate from the sample data gives the best-fit line for our particular sample. We want to use this best-fit line for the sample as an estimate of the best-fit line for the population. Examining the scatterplot and testing the significance of the correlation coefficient helps us determine if it is appropriate to do this.

## Using Residuals to Identify a Line of Good Fit

Mentor: In order to see whether a line is a good fit or a bad fit for a set of data we can examine the residuals of that line.

Student: Why are the residuals related to determining if the line is a good fit?

Mentor: Well, the residuals express the difference between the data on the line and the actual data so the values of the residuals will show how well the residuals represent the data.

Student: OK, well what do I look for when I'm examining the residuals?

Mentor: Well, if the line is a good fit for the data then the residual plot will be random. However, if the line is a bad fit for the data then the plot of the residuals will have a pattern.

Student: How would data that forms a pattern look compared to random data?

Mentor: Well, let's take a look at a set of data with a good fit and a set of data with a bad fit to see the difference. First, let's look at the residuals of a line that is a good fit for a data set. Using the Regression Activity, graph the data points: <(1, 3) (2, 4) (3, 3) (4, 7) (5, 6) (6, 6) (7, 7) (8, 9)>. Now, select Display line of best fit and select Show Residuals. Now you can see the Residual Plot of all of the residuals found when the predicted values of the line of best fit are subtracted from the actual values.

Student: The residuals appear randomly placed along the graph. I can see how this would be a random pattern of residuals. What would a residual plot look like for a line that was a bad fit for the data?

Mentor: Well, let's look at another graph. Using the Regression Activity, plot the following points: <(4, -11), (3, -6), (2, -3), (1, -2), (0, -3), (-1, -6), (-2, -11)>. These points graph the quadratic equation -x^2 +2x-3. Now, select Line of Best Fit to plot a line to fit the data. Now select Show Residuals in order to view the residual plot that you want to examine.

Student: Hey, the residuals form a pattern! They are definitely not randomly scattered, but instead they are making a curve. This line was not a good fit. Will there be times when I won't be able to tell if the residuals form a pattern or not?

Mentor: Sometimes you will not have enough residuals to be able to see a definite pattern in the plot, but in most cases you will be able to look at the residual plot and, using this criteria, determine whether the line is a good fit or a bad fit for the data.

Student: I noticed that the residual values (the values under Line of best fit) seem to have a sum of about 0. Does the sum of these residuals help determine whether a line is a good fit for the data or not?

Mentor: The sum of the residuals does not necessarily determine anything. The line of best fit will often have a sum of about 0 because it is including all data points and therefore it will be a bit too far above some data points and a bit too far below some data points. Therefore, in the case of the line of best fit often the positive error will balance out the negative error so that the sum of the residuals will be approximately 0. However, this does not mean that the line is a good fit for the data it only means that the line is equally above and below the actual data.

Student: OK, now I know that in order to find out if a line is a good fit for a set of data I can look at the residual plot and if the residuals are a pattern then the line is not a good fit.