anscombe

Submitted by pmagunia on February 26, 2017 - 11:28 AM
Attachment Size
dataset-68467.csv 364 bytes
Documentation

Anscombe's Quartet of ‘Identical’ Simple Linear Regressions

Four x-y datasets which have the same traditional statistical properties (mean, variance, correlation, regression line, etc.), yet are quite different.

Usage

anscombe

Format

A data frame with 11 observations on 8 variables.

x1 == x2 == x3 the integers 4:14, specially arranged
x4 values 8 and 19
y1, y2, y3, y4 numbers in (3, 12.5) with mean 7.5 and sdev 2.03

Source

Tufte, Edward R. (1989) The Visual Display of Quantitative Information, 13–14. Graphics Press.

References

Anscombe, Francis J. (1973) Graphs in statistical analysis. American Statistician, 27, 17–21.

Examples

require(stats); require(graphics)
summary(anscombe)

##-- now some "magic" to do the 4 regressions in a loop:
ff <- y ~ x
mods <- setNames(as.list(1:4), paste0("lm", 1:4))
for(i in 1:4) {
  ff[2:3] <- lapply(paste0(c("y","x"), i), as.name)
  ## or   ff[[2]] <- as.name(paste0("y", i))
  ##      ff[[3]] <- as.name(paste0("x", i))
  mods[[i]] <- lmi <- lm(ff, data = anscombe)
  print(anova(lmi))
}

## See how close they are (numerically!)
sapply(mods, coef)
lapply(mods, function(fm) coef(summary(fm)))

## Now, do what you should have done in the first place: PLOTS
op <- par(mfrow = c(2, 2), mar = 0.1+c(4,4,1,1), oma =  c(0, 0, 2, 0))
for(i in 1:4) {
  ff[2:3] <- lapply(paste0(c("y","x"), i), as.name)
  plot(ff, data = anscombe, col = "red", pch = 21, bg = "orange", cex = 1.2,
       xlim = c(3, 19), ylim = c(3, 13))
  abline(mods[[i]], col = "blue")
}
mtext("Anscombe's 4 Regression data sets", outer = TRUE, cex = 1.5)
par(op)

From Around the Site...

Title Authored on Content type
R Dataset / Package mosaicData / Birthdays March 9, 2018 - 1:06 PM Dataset
R Dataset / Package car / Friendly March 9, 2018 - 1:06 PM Dataset
R Dataset / Package DAAG / jobs March 9, 2018 - 1:06 PM Dataset
R Dataset / Package DAAG / possum March 9, 2018 - 1:06 PM Dataset
R Dataset / Package Stat2Data / MentalHealth March 9, 2018 - 1:06 PM Dataset