Chapter 7 - Scatterplots, Association, and Correlation
Tuesday, 20 September 2022 | |
2-minute read | |
220 words | |
Scatterplot
A scatterplot displays a set of ordered pairs. It is a common way of displaying two-variable data.
Scatterplots can have varying degrees of linearity.
- Random scatter - no pattern can be determined in the data
- Positive relationship - positive slope
- Negative relationship - negative slope
Important Ideas
Association
- linear
- non-linear
Correlation - only use correlation if there is a linear association
- positive
- negative
- constant
Strength - how "strong" or tight the linear correlation is
- strong
- moderate to strong
- moderate
- weak to moderate
- weak
Unusual Features
- outlier point
- influential point
- high-leverage point
Using your calculator to create scatterplots
Enter data into data list (customarily we put X in $L_1$ and Y in $L_2$)
We refer to X as the explanatory variable and Y as the response variable.
Correlation
- Only applies to quantitative variables
- Does your scatterplot look "approximately linear"?
- Report correlation values both with and without outliers to see if they make a big difference
$r$ = correlation coefficient
- $-1 \le r \le 1$
- r-value of $\pm 1$ means it is perfectly linear
- r-value of $0$ would be random scatter
Interpreting Computer Output Example
Variable Coefficient Intercept 5773.27 Wins 517.609
For every increase by one win, a predicted amount of 517.609 people will attend the game.
There is a predicted amount of 5773.27 attendees if the team won 0 games.