In this session we will carry on working with the biling_data dataset and practice data visualisation. To avoid having to redo all data cleaning (or copypaste the code) continue working from the script you created in the previous session.

While the base R plot() function is powerful enough to create essentially any plot you might want to create, we will focus on using the ggplot2 package. However, we’ll start with a few quick and simple base R plots.

Quick plots in base R

 

Task 1

Use the hist() function to plot a histogram of the age variable.

Solution

hist(biling_data$age)

 

Task 2

Use the plot() function to create a scatterplot of wais and BILING.

Solution

plot(x = biling_data$wais, y = biling_data$BILING)

As you can see plot() tries to guess what kind of plot you want to create given the classes of the variables you pass to the function.

 

Task 3

Use plot() again, this time with the gender variable only.

Solution

plot(biling_data$gender)

This doesn’t look great, does it? The reason is that gender is has not been converted into a factor.

 

Task 4

Use plot() again, this time with the gender variable only.

Hint You can either turn gender into a factor permanently by reassigning the variable or only for the purpose of the plot inside of the plot() function.

Solution

# only converts gender to factor for the purpose of the plot
# plot(factor(biling_data$gender, labels = c("Male", "Female", "Other")))

# permanent change
biling_data$gender <- factor(biling_data$gender, labels = c("Male", "Female", "Other"))
plot(biling_data$gender)

 

Task 5

Try plotting yearsFR by gender.

Hint Again, just use plot(); it’s really rather flexible. Make sure gender is a factor though!

Solution

plot(biling_data$gender, biling_data$yearsFR)

 

Task 6

Finally, let’s see what plot we get when we use two factors. Plot DALF_PASS (as factor!) against gender.

Solution

biling_data$DALF_PASS <- factor(biling_data$DALF_PASS, labels = c("Fail", "Pass"))
plot(biling_data$gender, biling_data$DALF_PASS)

That’s a nice mosaic plot, isn’t it!

 

There are numerous options and arguments to the plot() function you can use to modify the aesthetics of the plot. Instead of dealing with those, let’s move on to ggplot() and leave base R plots with an example plot. It is not very pretty but it demonstrates some of the capabilities of base R graphics:

plot(biling_data$wais, biling_data$BILING,
     xlab = "IQ", ylab = "Bilingualism score", # axis labels
     main = "Relationship between intelligence and bilingualism", # plot title
     type = "n") # don't plot any points
points(biling_data$wais[biling_data$gender == "Male"], # plot points for gender == "Male"
       biling_data$BILING[biling_data$gender == "Male"],
       col = "#fac21888", # hex code for colour: #RRGGBBAA - red, green, blue, alpha (opacity)
       pch = 17) # "point character" governs the shape of the point
points(biling_data$wais[biling_data$gender == "Female"], # plot points for gender == "Female"
       biling_data$BILING[biling_data$gender == "Female"],
       col = "#0d5f8a88",
       pch = 18)
points(biling_data$wais[biling_data$gender == "Other"],  # plot points for gender == "Other"
       biling_data$BILING[biling_data$gender == "Other"],
       col = "#660a6088",
       pch = 19)
abline(h = mean(biling_data$BILING), # h= y intercept of horizontal line
       lty = 5) # "line type"
abline(v = mean(biling_data$wais), lty = 5) # v= x intercept of vertical line
abline(lm(BILING ~ wais, biling_data), # abline can take a lm object to draw regression line
       col = "orangered", # there are many colour names that R understands
       lwd = 2) # "line width"
# add legend
legend(x = 125, y = 100, 
       c("Male", "Femle", "Other"), # legend labels
       col = c("#fac21888", "#0d5f8a88", "#660a6088"), # colours of points in legend
       pch = 17:19, # shapes of points in legend
       bty = "n") # "box type" n for no frame around legend

ggplot()

Let’s (mostly) re-create the plot above with ggplot() now.

 

Task 7

First of all, create the plotting space mapping the right variables onto the x and y axes.

Hint Remember to map variables onto axes using the aes() function.

Solution

biling_data %>%
  ggplot(aes(x = wais, y = BILING))

 

Task 8

Add the scatter layer.

Hint That’s geom_point().

Solution

biling_data %>%
  ggplot(aes(x = wais, y = BILING)) +
  geom_point()

 

Task 8.1

Make the colour and shape of the points dependent on levels of gender.

Hint You can map variables onto aesthetics within geom_point (or any other layer).

Solution

biling_data %>%
  ggplot(aes(x = wais, y = BILING)) +
  geom_point(aes(colour = gender, shape = gender))

 

Task 8.2

Let’s get rid of the NA points by filtering the data so that they don’t include NAs in the gender variable before piping it into ggplot().

Hint is.na(x) returns TRUE if x is NA. To negate an expression, you can put a ! in front of it.

Solution

biling_data %>%
  dplyr::filter(!is.na(gender)) %>%
  ggplot(aes(x = wais, y = BILING)) +
  geom_point(aes(colour = gender, shape = gender))

 

Task 8.3

Add a little transparency to the points using the alpha= argument (1 = fully opaque; 0 = fully transparent) and make the points slightly bigger.

Hint You are not mapping any variables onto the alpha= and size= arguments so don’t use aes().

Solution

biling_data %>%
  dplyr::filter(!is.na(gender)) %>%
  ggplot(aes(x = wais, y = BILING)) +
  geom_point(aes(colour = gender, shape = gender),
             alpha = .7, size = 2)