Introduction to R

Eirini Zormpa

The RSA

Software installation

If you weren’t able to install software, let one of the IT team members know

Introductions 👋

Eirini Zormpa, Quantitative Researcher

  • Secondary data from the ONS 💸 🏠 🍃


Previously:

  • Research infrastructure @ TU Delft & The Alan Turing Institute
    • Participation data in courses and other events 🎓
  • PhD Student @ Max Planck Institute for Psycholinguistics
    • Primary data from human participants 💬 👀 🧠

What will you use R for?

Tell us in the chat!

Why R?

R is both a programming language and the software used to interpret it. It is free and open source. 💰 = 🎉

  • ♻️ reproducibility: because R is a programming language (not point-and-click), you don’t have to remember what you pressed when to repeat something. It’s all written down for you in a script!
  • 💾 working with data: R was created by statisticians for statistics and data work is where it shines
  • 🦪 working in any discipline: because R is open source, anyone can contribute code to extend its functionality (currently 10,000+ packages).
  • 💗 supportive community

R in the wild

Why this workshop series

  • Teaches you all the basics for getting started with working with data
  • Is based on open educational materials that have been refined over many iterations - credit to The Carpentries 💙
  • Uses a live-coding format, which is excellent for beginners
  • We’re here to help 🙌

Getting help

There are helpers in person and online. If your code doesn’t work, they’re here to help you 💪:

  • If you’re attending in person, raise your hand and wait until someone comes over.
  • If you’re attending online, paste the code you ran and the error message in the chat. Don’t just say that something didn’t work for you, as we won’t have enough information to help you.

Create an R Project

  1. Under the File menu, click on New project
  2. In the wizard that pops up click on New directory > New project
  3. You will now create the working directory for the rest of the workshop and save it in a convenient location.
  4. Give a good name to your new directory (folder), e.g. r-workshop. Make sure your name doesn’t have spaces or special characters!
  5. Click on Browse and navigate to a suitable location for this repository and click on Open when you are in a location you are happy with.
  6. Click on Create project.

R Projects: File paths ♻️

Below you see two ways of reading data into R. They both work and they both access the same file.

Which one looks more reproducible?:

# option 1: absolute path
census_data <- read_csv("/Users/Eirini.zormpa/Documents/rsa-r-training/data_raw/synthetic-census-data.csv")

# option 2: relative path
census_data <- read_csv("data_raw/synthetic-census-data.csv")

Option 2 is more reproducible, as it allows you to move your project around on your computer and share it with others without having to directly modify file paths in the individual scripts.

R Projects: Folder structure ♻️

It is good practice to keep all files related to a project in a single folder, called the working directory. This includes data, scripts, outputs, and documentation.

This makes sharing and documenting your projects much easier.

flowchart TB
  A[working-directory] --> B["data_raw/"]
  A[working-directory] --> C["data_processed/"]
  A[working-directory] --> D["figures/"]
  A[working-directory] --> E([LICENCE.md])
  A[working-directory] --> F([paper.qmd])
  A[working-directory] --> G([README.md])
  A[working-directory] --> H["scripts/"]

Exercise 1.1

5 mins

05:00

Create two variables income and expenses and assign them values. Create a third variable profit and give it a value based on the current values of income and expenses Show that changing the values of either income or expenses does not affect the value of profit

Exercise 1.1 solution

income <- 100
expenses <- 90
profit <- income - expenses
profit
[1] 10
# change the values of `income` and `expenses`
income <- 80
expenses <- 100

# the value of `profit` hasn't changed
profit
[1] 10

Functions and their arguments

  • Functions are like “canned” scripts that do a specific task.
  • They usually take some kind of input (called an argument) and often give back some kind of output.
  • Running or executing a function is often termed calling a function.
  • The arguments of functions can be anything: e.g. numbers, filenames, but also other objects.

Vectors and data structures

  • A vector is the simplest R data structure.
  • It is composed by a series of values of the same type, e.g.character and numeric (or double).
  • Other vector types are: logical for TRUE and FALSE, integer for integer numbers and two others we won’t discuss (complex and raw).

Exercise 1.2

10 mins

10:00

What will happen in each of these examples?

num_char <- c(1, 2, 3, "a")
num_logical <- c(1, 2, 3, TRUE)
char_logical <- c("a", "b", "c", TRUE)
tricky <- c(1, 2, 3, "4")

Hint: use typeof() to check the data type of your objects

Exercise 1.2 solution

typeof(num_char)
[1] "character"
typeof(num_logical)
[1] "double"
typeof(char_logical)
[1] "character"
typeof(tricky)
[1] "character"

Vectors can be of only one data type. R tries to convert (coerce) the content of this vector to find a “common denominator” that doesn’t lose any information.

Exercise 1.3

10 mins

10:00
bedrooms <- c(1, 2, 1, 1, NA, 3, 1, 3, 2, 1, 1, 8, 3, 1, NA, 1)
  1. Using the above vector, create a new vector with the NAs removed.
  2. Use the function median() to calculate the median of the bedrooms vector.

Exercise 1.3 solution

# 1. 
bedrooms_no_na <- na.omit(bedrooms)

# 2.
median(bedrooms, na.rm = TRUE)
[1] 1

Summary

So far you have learned how to:

  • Navigate the RStudio Graphical User Interface (GUI).
  • Create files and R projects.
  • Perform simple arithmetic calculations in R.
  • Create objects and vectors.
  • Use functions.
  • Work with missing data.

Programming note

When we say R is a language, we mean just that: We need to learn a new way of communicating that lets us talk to the R software.

Software isn’t as smart as humans and have no tolerance for errors: if we don’t tell it what we do just the way it wants, it won’t work.

Learning how to speak the software’s language takes time and practice, but we’re here to help you 💪

Thank you for your attention