Data visualisation with ggplot2

Eirini Zormpa

The RSA

Last time you learned how to:

  • Subset columns or rows with select or filter and create new columns with mutate.
  • Link the output of one function to the input of another function with the ‘pipe’ operator %>%.
  • Use summarise, group_by, and count to split a data frame into groups of observations, apply summary statistics for each group, and then combine the results.
  • Export a dataframe to a .csv and .tsv file.

Learning objectives

  • Produce scatter plots, boxplots, and time-series plots
  • Set universal plot settings
  • Describe what faceting is and apply faceting
  • Modify the aesthetics of an existing ggplot plot (including axis labels and color)
  • Build complex and customised plots from data in a data frame

Why ggplot2?

ggplot2

ggplot2 is a package (included in the tidyverse) for creating highly customisable plots that are built step-by-step by adding layers.

The separation of a plot into layers allows a high degree of flexibility with minimal effort.

ggplot2 layers

A fuzzy monster in a beret and scarf, critiquing their own column graph on a canvas in front of them while other assistant monsters (also in berets) carry over boxes full of elements that can be used to customize a graph (like themes and geometric shapes). In the background is a wall with framed data visualizations. Stylized text reads “ggplot2: build a data masterpiece.





<DATA> %>%
  ggplot(aes(<MAPPINGS>)) +
  <GEOM_FUNCTION>() +
  <CUSTOMISATION>

Data visualisation crash-course

Aesthetics

Whenever we visualise data, we take data values and convert them in a systematic and logical way into the visual elements that make up the final graphic. […] All data visualisations map data values into quantifiable features of the resulting graphic. We refer to these features as aesthetics.

Commonly-used aesthetics

  • position (x and y coordinates),
  • colour
  • size
  • shape
  • line type

Find the green dot lvl. 1️⃣

Find the green dot lvl. 2️⃣

Find the green dot lvl. 3️⃣

Colour considerations

In the previous game, people with the most common type of colour-blindness would have struggled to perceive the colour distinction 😱

A comparison of the visible color spectrum in common types of color blindness. For people with Deuteranomaly, affecting 2.7% of the population, red and green are difficult to distinguish from one another.

Viridis palettes

Are colourblind-friendly…

… and they’re very pretty 😍

Data visualisation exercises

Exercise 4.1

5 mins

05:00

Use what you just learned to create a scatter plot of cars by household_size with the dwelling_type showing in different colours.

Exercise 4.1 solution

census_viz_data %>% 
  ggplot(aes(x = household_size, y = cars)) +
  geom_jitter(aes(colour = dwelling_type),
              alpha = 0.3,
              width = 0.3,
              height = 0.3)

Exercise 4.1 solution (viridisLite)

census_viz_data %>% 
  ggplot(aes(x = household_size, y = cars)) +
  geom_jitter(aes(colour = dwelling_type),
              alpha = 0.3,
              width = 0.3,
              height = 0.3) +
  scale_colour_viridis_d()

Exercise 4.2

10 mins

10:00

Replace the box plot with a violin plot; see geom_violin().

Exercise 4.2 solution

census_viz_data %>% 
  ggplot(aes(x = dwelling_type, y =  bedrooms)) +
  geom_violin(alpha = 0) +
  geom_jitter(alpha = 0.3,
              colour = "tomato",
              width = 0.3,
              height = 0.3)

Exercise 4.3

5 mins

05:00

Build the previous plot again and experiment with at least two themes.

Which do you like best?

theme_minimal
theme_void
theme_classic

theme_dark
theme_grey
theme_light

Exercise 4.3: My preference

I prefer the white background of theme_minimal and I like that it retains the major grid, though that’s slightly controversial.

I also like that it gets rid of the black box around the plot.

This is just the beginning!

ggplot2 and compatible packages give you a huge amount of flexibility to create exactly the graph you want!

You can explore packages that let you play around with:

  • beautiful palettes (e.g. ghibli, wesanderson),
  • new themes (e.g. hrbrthemes)
  • additional fonts (e.g. extrafont)
  • animated graphs (e.g. gganimate)
  • and so much more!

Thank you for your attention