The RSA
data.frames
or tibbles
.factors
.data.frames
are the standard data structure for tabular data in R
.
They look very similar to spreadsheets (like in Excel) but each column is, in fact, a vector:
Technically, what we will be working with in these workshops aren’t data.frames
, they are tibbles
. tibbles
are basically dataframes for the tidyverse
- they have some subtle differences but nothing you need to worry about at this point.
⚠️ NOT REAL DATA ⚠️
The data have been modified from another dataset to mimic ONS Census data. Their sole purpose is to be used in training.
variable | description |
---|---|
ID | a number to identify the participant |
region | where in the UK the participant is located |
interview_date | the date the interview took place |
household_size | the number of members in the household |
age | the ages of the people in the household |
dwelling_type | the type of dwelling |
bedrooms | the number of bedrooms in the dwelling |
central_heating | whether the dwelling has central heating |
cars | the number of cars the participant owns |
community_establishment | the types of community establishment in the area |
religion | the participant’s religion |
Files
tab are the right ones (you should only see the scripts
folder and the .Rproj
file)Then we need to 1) download the data and 2) save it in the data_raw
folder we just created it.
We can do both in one go in R by typing the following command in the console:
After you have run this command, open the data_raw
folder and check that there is a file called census_data.csv
.
⏰ 10 mins
10:00
census_200
) with the data in row 200 of census_data
census_last
) from the last row, without typing out the row number
tail()
census_middle
) from the middle row of the dataset-
notation to reproduce the behavior of head(census_data)
(show rows 1-6)R has a special data class, called factor
, to deal with categorical data. Factors:
levels
(values) of a categorical variable, such as days of the week or responses to a question in a survey⏰ 5 mins
05:00
region
and dwelling_type
in the census_data
data frame into a factor.dwelling_type
column?⏰ 5 mins
05:00
To avoid ambiguity, use the RFC3339 standard: YYYYMMDD (or YYYY-MM-DD).
tibbles
.factors
.