Data Visualization

R Libraries

  • {dplyr} for arranging data frames
  • The Grammar of Graphics: {ggplot}
  • Libraries leveraging {ggplot2}; + others

The Grammar of Graphics Library: {ggplot2}

5 components of {ggplot2}

  1. Layer containing geometric elements and data
  2. Scales that map values in the data space to values in aesthetic space
  3. Coordinate System for mapping coordinates to the graphic plane
  4. Facet for arranging the data into a grid
  5. Theme (e.g., like font, background, grids, axes, etc.)

Layers Contain

  1. Data (e.g., vector or data frame)
  2. Mapping (e.g., aesthetics corresponding to data)
  3. Statistical Transformation (e.g., sums, means, model fits, etc.)
  4. Geometric object (geom) controlling the type of visualization
  5. Position Adjustment (e.g., location of visual elements)

Data

DATA <- data.frame(
 var1 = c(1, 2, 3, 4), 
 var2 = c(2, 5, 3, 8), 
 var3 = c(10, 15, 32, 28), 
 group = c("A", "A", "B", "B")
)

Data

DATA
  var1 var2 var3 group
1    1    2   10     A
2    2    5   15     A
3    3    3   32     B
4    4    8   28     B

Initialize the Plot Object

DATA |>
  ggplot()

Aesthetics

  • the visual elements of the data in the visualization
  • color, fill, size, fill, shape, linetype, linewidth, transparency, etc.
  • different geoms have/allow different aesthetics (e.g., lines have color but not fill)
  • can be constants (e.g., blue) or mapped to data as variables (e.g., blue or red)

Mapping Data and Aesthetics

Mapping

  • specified by arguments to aes()
  • at least an x or y (e.g., geom_histogram())
  • some geoms need both (e.g., geom_point(), geom_col(), etc.)

Setting vs. Mapping

  • mapping: specified by arguments to aes()
  • setting: specified by arguments in the geom_*(); outside of aes()

Mapping Data and Aesthetics (Cont.)

DATA |>
  ggplot(mapping = aes(x = var1))

Plot Geometries

  • x or y: geom_histogram(), geom_density(), geom_bar(), etc.
  • x & y:geom_point(), geom_col(), geom_line(), etc.

Adding Plot Geometries

  • add to object using + (don’t |>)

Adding a Geometry: geom_histogram()

geom_histogram(
  mapping = NULL,
  data = NULL,
  stat = "bin",
  position = "identity",
  ...
  )

Adding a Geometry: geom_histogram()

DATA |>
  ggplot(mapping = aes(x = var1)) +
  geom_histogram()

Adding a Geometry: geom_point()

geom_point(
  mapping = NULL,
  data = NULL,
  stat = "identity",
  position = "identity",
  ...
  )

Adding a Geometry: geom_point()

DATA |>
  ggplot(mapping = aes(x = var1,
                       y = var2
                       )
         ) +
  geom_point()

Setting an Aesthetic: Color

DATA |>
  ggplot(mapping = aes(x = var1,
                       y = var2
                       )
         ) +
  geom_point(color = "red")

Mapping an Aesthetic: Color

DATA |>
  ggplot(mapping = aes(x = var1, 
                       y = var2
                       )
         ) +
  geom_point(mapping = aes(color = group))

Errors with Mapping and Setting

DATA |>
  ggplot(mapping = aes(x = var1, 
                       y = var2
                       )
         ) +
  geom_point(mapping = aes(color = "green"))

Adding a Geometry: geom_col()

geom_col(
  mapping = NULL,
  data = NULL,
  position = "stack",
  ...
  )

Adding a Geometry: geom_col()

DATA |>
  ggplot(mapping = aes(x = var1, 
                       y = var2
                       )
         ) +
  geom_col()

Adding a Geometry: geom_col() (Cont.)

DATA |>
  ggplot(mapping = aes(x = group, 
                       y = var2
                       )
         ) +
  geom_col()

Notice anything odd?

Adding a Geometry: geom_col() (Cont.)

Set aesthetics to make more apparent.

DATA |>
  ggplot(mapping = aes(x = group, 
                       y = var2
                       )
         ) +
  geom_col(fill = "yellow", color = "blue")

Remember the Data?

DATA
  var1 var2 var3 group
1    1    2   10     A
2    2    5   15     A
3    3    3   32     B
4    4    8   28     B
  • all plots have some statistical transformation
  • could be "identity" (what you see is what you get)
  • could be based on a statistic (e.g., count, sum, mean, etc.)

Change the Data Frame (e.g., summarize)

DATA |>
  # aggregate across the groups, then summarize
  group_by(group) |>
  summarize(var2 = mean(var2, na.rm = TRUE)) 
# A tibble: 2 × 2
  group  var2
  <chr> <dbl>
1 A       3.5
2 B       5.5

Plot that New Data Frame

DATA |>
  # aggregate across the groups, then summarize
  group_by(group) |>
  summarize(var2 = mean(var2, na.rm = TRUE)) |>
  
  # then plot
  ggplot(mapping = aes(x = group, 
                       y = var2)
         ) +
  geom_col()

Adding a Geometry: geom_boxplot()

geom_boxplot(
  mapping = NULL,
  data = NULL,
  stat = "boxplot",
  position = "dodge2",
  ...
  )

Adding a Geometry: geom_boxplot()

DATA |>
  ggplot(mapping = aes(x = group, 
                       y = var2
                       )
         ) +
  geom_boxplot(mapping = aes(fill = group),
               show.legend = FALSE
               )

Adding Multiple Geometries

Adding Geometries as Plot Layers

  • add a layer using +
  • specify the data
  • map the aesthetics

Make Some New Data

DATA2 <- 
  data.frame(
    var1 = rnorm(n = 100, mean = 20, sd = 2),
    var2 = rnorm(n = 100, mean = 55, sd = 3),
    group1 = rep(c("A", "B"), 50),
    group2 = rep(c("A", "A", "B", "B"), 25)
  )

Adding Geometry Layers: geom_boxplot() + geom_point()

DATA |>
  ggplot(mapping = aes(x = group, 
                       y = var2
                       )
         ) +
  # add a boxplot layer; remove legend
  geom_boxplot(color = "black",
               fill = "white",
               show.legend = FALSE,
               notch = TRUE
               ) +
  # add a point layer with jittered points
  geom_point(position = position_jitter(width = .2),
             alpha = .7
             )

Adding Geometry Layers: geom_boxplot() + geom_point()

Small Multiples

Small Multiples: Replicating Plots for Subgroups

  • Sometimes you need create multiple plots by another variable
  • Bar plots for each month; scatter plot for each city; etc.

Adding a Facet

  • a plot is in a facet by default
  • to change the facet, add a facet layer
  • facet_wrap() or facet_grid()

Adding a Facet Based on One Variable

DATA2 |>
  ggplot(mapping = aes(x = var1, 
                       y = var2)
         ) +
  geom_point() +
  # facet by one variable
  facet_wrap(facets = vars(group1))

Adding a Facet Based on One Variable

Adding a Facet Based on Two Variables

Adding a Facet Based on Two Variables

Adding a Theme

  • a plot is in a theme by default
  • to change a theme, add a theme layer

Adding a Theme

DATA2 |>
  ggplot(mapping = aes(x = var1, 
                       y = var2)
         ) +
  geom_point() +
  facet_wrap(facets = vars(group1)) +
  # change the theme
  theme_minimal()

Adding a Theme

DATA2 |>
  ggplot(mapping = aes(x = var1, 
                       y = var2)
         ) +
  geom_point() +
  facet_wrap(facets = vars(group1)) +
  # change the theme
  theme_minimal()