In this session we will carry on working with the biling_data dataset and practice data visualisation. To avoid having to redo all data cleaning (or copypaste the code) continue working from the script you created in the previous session.

While the base R plot() function is powerful enough to create essentially any plot you might want to create, we will focus on using the ggplot2 package. However, we’ll start with a few quick and simple base R plots.

Quick plots in base R

 

Task 1

Use the hist() function to plot a histogram of the age variable.

Solution

hist(biling_data$age)

 

Task 2

Use the plot() function to create a scatterplot of wais and BILING.

Solution

plot(x = biling_data$wais, y = biling_data$BILING)

As you can see plot() tries to guess what kind of plot you want to create given the classes of the variables you pass to the function.

 

Task 3

Use plot() again, this time with the gender variable only.

Solution

plot(biling_data$gender)

This doesn’t look great, does it? The reason is that gender is has not been converted into a factor.

 

Task 4

Use plot() again, this time with the gender variable only.

Hint You can either turn gender into a factor permanently by reassigning the variable or only for the purpose of the plot inside of the plot() function.

Solution

# only converts gender to factor for the purpose of the plot
# plot(factor(biling_data$gender, labels = c("Male", "Female", "Other")))

# permanent change
biling_data$gender <- factor(biling_data$gender, labels = c("Male", "Female", "Other"))
plot(biling_data$gender)

 

Task 5

Try plotting yearsFR by gender.

Hint Again, just use plot(); it’s really rather flexible. Make sure gender is a factor though!

Solution

plot(biling_data$gender, biling_data$yearsFR)

 

Task 6

Finally, let’s see what plot we get when we use two factors. Plot DALF_PASS (as factor!) against gender.

Solution

biling_data$DALF_PASS <- factor(biling_data$DALF_PASS, labels = c("Fail", "Pass"))
plot(biling_data$gender, biling_data$DALF_PASS)

That’s a nice mosaic plot, isn’t it!

 

There are numerous options and arguments to the plot() function you can use to modify the aesthetics of the plot. Instead of dealing with those, let’s move on to ggplot() and leave base R plots with an example plot. It is not very pretty but it demonstrates some of the capabilities of base R graphics:

plot(biling_data$wais, biling_data$BILING,
     xlab = "IQ", ylab = "Bilingualism score", # axis labels
     main = "Relationship between intelligence and bilingualism", # plot title
     type = "n") # don't plot any points
points(biling_data$wais[biling_data$gender == "Male"], # plot points for gender == "Male"
       biling_data$BILING[biling_data$gender == "Male"],
       col = "#fac21888", # hex code for colour: #RRGGBBAA - red, green, blue, alpha (opacity)
       pch = 17) # "point character" governs the shape of the point
points(biling_data$wais[biling_data$gender == "Female"], # plot points for gender == "Female"
       biling_data$BILING[biling_data$gender == "Female"],
       col = "#0d5f8a88",
       pch = 18)
points(biling_data$wais[biling_data$gender == "Other"],  # plot points for gender == "Other"
       biling_data$BILING[biling_data$gender == "Other"],
       col = "#660a6088",
       pch = 19)
abline(h = mean(biling_data$BILING), # h= y intercept of horizontal line
       lty = 5) # "line type"
abline(v = mean(biling_data$wais), lty = 5) # v= x intercept of vertical line
abline(lm(BILING ~ wais, biling_data), # abline can take a lm object to draw regression line
       col = "orangered", # there are many colour names that R understands
       lwd = 2) # "line width"
# add legend
legend(x = 125, y = 100, 
       c("Male", "Femle", "Other"), # legend labels
       col = c("#fac21888", "#0d5f8a88", "#660a6088"), # colours of points in legend
       pch = 17:19, # shapes of points in legend
       bty = "n") # "box type" n for no frame around legend

ggplot()

Let’s (mostly) re-create the plot above with ggplot() now.

 

Task 7

First of all, create the plotting space mapping the right variables onto the x and y axes.

Hint Remember to map variables onto axes using the aes() function.

Solution

biling_data %>%
  ggplot(aes(x = wais, y = BILING))

 

Task 8

Add the scatter layer.

Hint That’s geom_point().

Solution

biling_data %>%
  ggplot(aes(x = wais, y = BILING)) +
  geom_point()

 

Task 8.1

Make the colour and shape of the points dependent on levels of gender.

Hint You can map variables onto aesthetics within geom_point (or any other layer).

Solution

biling_data %>%
  ggplot(aes(x = wais, y = BILING)) +
  geom_point(aes(colour = gender, shape = gender))

 

Task 8.2

Let’s get rid of the NA points by filtering the data so that they don’t include NAs in the gender variable before piping it into ggplot().

Hint is.na(x) returns TRUE if x is NA. To negate an expression, you can put a ! in front of it.

Solution

biling_data %>%
  dplyr::filter(!is.na(gender)) %>%
  ggplot(aes(x = wais, y = BILING)) +
  geom_point(aes(colour = gender, shape = gender))

 

Task 8.3

Add a little transparency to the points using the alpha= argument (1 = fully opaque; 0 = fully transparent) and make the points slightly bigger.

Hint You are not mapping any variables onto the alpha= and size= arguments so don’t use aes().

Solution

biling_data %>%
  dplyr::filter(!is.na(gender)) %>%
  ggplot(aes(x = wais, y = BILING)) +
  geom_point(aes(colour = gender, shape = gender),
             alpha = .7, size = 2)

 

Task 9

Add the horizontal dashed line with a y intercept at the mean of BILING using geom_hline(). Line geoms in ggplot take the same lty= argument as base R lines.

Hint Because you are mapping the mean of a variable in your dataset onto yintercept=, it needs to be done within aes().

Solution

biling_data %>%
  dplyr::filter(!is.na(gender)) %>%
  ggplot(aes(x = wais, y = BILING)) +
  geom_point(aes(colour = gender, shape = gender),
             alpha = .7, size = 2) +
  geom_hline(aes(yintercept = mean(BILING)), lty = 5)

 

Task 10

Now add the vertical dashed line with a x intercept at the mean of wais.

Hint If geom_hline() makes a horizontal line, what geom do you think makes a vertical line?

Solution

biling_data %>%
  dplyr::filter(!is.na(gender)) %>%
  ggplot(aes(x = wais, y = BILING)) +
  geom_point(aes(colour = gender, shape = gender),
             alpha = .7, size = 2) +
  geom_hline(aes(yintercept = mean(BILING)), lty = 5) +
  geom_vline(aes(xintercept = mean(wais)), lty = 5)

 

Task 11

One more element to add. Use geom_smooth() to add a trend line. You don’t need to specify any arguments.

Solution

biling_data %>%
  dplyr::filter(!is.na(gender)) %>%
  ggplot(aes(x = wais, y = BILING)) +
  geom_point(aes(colour = gender, shape = gender),
             alpha = .7, size = 2) +
  geom_hline(aes(yintercept = mean(BILING)), lty = 5) +
  geom_vline(aes(xintercept = mean(wais)), lty = 5) +
  geom_smooth()

 

Task 11.1

OK, this is a LOESS line with confidence interval ribbon. Check the function documentation (or the ggplot2 reference website to find out how to change it to a linear regression line with no ribbon around it.

While you’re at it, you may as well change the colour of the line.

Hint Check out the method= and se= arguments.

Solution

biling_data %>%
  dplyr::filter(!is.na(gender)) %>%
  ggplot(aes(x = wais, y = BILING)) +
  geom_point(aes(colour = gender, shape = gender),
             alpha = .7, size = 2) +
  geom_hline(aes(yintercept = mean(BILING)), lty = 5) +
  geom_vline(aes(xintercept = mean(wais)), lty = 5) +
  geom_smooth(method = "lm", se = F, colour = "orangered")

 

Task 12

Use the labs() layer to change axis labels and add title.

Solution

biling_data %>%
  dplyr::filter(!is.na(gender)) %>%
  ggplot(aes(x = wais, y = BILING)) +
  geom_point(aes(colour = gender, shape = gender),
             alpha = .7, size = 2) +
  geom_hline(aes(yintercept = mean(BILING)), lty = 5) +
  geom_vline(aes(xintercept = mean(wais)), lty = 5) +
  geom_smooth(method = "lm", se = F, colour = "orangered") +
  labs(x = "IQ", y = "Bilingualism score",
       title = "Relationship between intelligence and bilingualism")

 

Task 13

That’s the plot basically finished. All we need to do now is to change the appearance.

 

Task 13.1

The scale_colour_manual() layer can be used to customise anything to do with the colour aesthetic, including the legend. Use it to get rid of the legend name and change the colours for the gender categories to "#fac218", "#0d5f8a", and "#660a60" for males, females and “other”, respectively.

Hint You only need the name= and vlaues= arguments.

To get rid of the legend name, set it to an empty string, "".

Solution

biling_data %>%
  dplyr::filter(!is.na(gender)) %>%
  ggplot(aes(x = wais, y = BILING)) +
  geom_point(aes(colour = gender, shape = gender),
             alpha = .7, size = 2) +
  geom_hline(aes(yintercept = mean(BILING)), lty = 5) +
  geom_vline(aes(xintercept = mean(wais)), lty = 5) +
  geom_smooth(method = "lm", se = F, colour = "orangered") +
  labs(x = "IQ", y = "Bilingualism score",
       title = "Relationship between intelligence and bilingualism") +
  scale_colour_manual(name = "", values = c("#fac218", "#0d5f8a", "#660a60"))

The plot now has two legends. The reason for this is that, by changing the name= argument, the legend for colour is now different from the one for shape.

 

Task 13.2

Set scale_shape_manual() in the same fashion as above. Use values 17, 18, and 19.

Solution

biling_data %>%
  dplyr::filter(!is.na(gender)) %>%
  ggplot(aes(x = wais, y = BILING)) +
  geom_point(aes(colour = gender, shape = gender),
             alpha = .7, size = 2) +
  geom_hline(aes(yintercept = mean(BILING)), lty = 5) +
  geom_vline(aes(xintercept = mean(wais)), lty = 5) +
  geom_smooth(method = "lm", se = F, colour = "orangered") +
  labs(x = "IQ", y = "Bilingualism score",
       title = "Relationship between intelligence and bilingualism") +
  scale_colour_manual(name = "", values = c("#fac218", "#0d5f8a", "#660a60")) +
  scale_shape_manual(name = "", values = 17:19)

As you can see, the legends have now merged into one.

 

Task 14

Let’s give the plot a little more of a classic look by adding a theme layer.

Hint Just add a theme_classic() layer.

Solution

biling_data %>%
  dplyr::filter(!is.na(gender)) %>%
  ggplot(aes(x = wais, y = BILING)) +
  geom_point(aes(colour = gender, shape = gender),
             alpha = .7, size = 2) +
  geom_hline(aes(yintercept = mean(BILING)), lty = 5) +
  geom_vline(aes(xintercept = mean(wais)), lty = 5) +
  geom_smooth(method = "lm", se = F, colour = "orangered") +
  labs(x = "IQ", y = "Bilingualism score",
       title = "Relationship between intelligence and bilingualism") +
  scale_colour_manual(name = "", values = c("#fac218", "#0d5f8a", "#660a60")) +
  scale_shape_manual(name = "", values = 17:19) +
  theme_classic()

Looking at the plot, maybe it would be a little better if the dashed lines were behind the points rather than on top of them.

 

Task 14.1

Hide the dashed lines behind the scatter.

Hint Simply rearrange the order of your geoms.

Solution

biling_data %>%
  dplyr::filter(!is.na(gender)) %>%
  ggplot(aes(x = wais, y = BILING)) +
  geom_hline(aes(yintercept = mean(BILING)), lty = 5) +
  geom_vline(aes(xintercept = mean(wais)), lty = 5) +
  geom_point(aes(colour = gender, shape = gender),
             alpha = .7, size = 2) +
  geom_smooth(method = "lm", se = F, colour = "orangered") +
  labs(x = "IQ", y = "Bilingualism score",
       title = "Relationship between intelligence and bilingualism") +
  scale_colour_manual(name = "", values = c("#fac218", "#0d5f8a", "#660a60")) +
  scale_shape_manual(name = "", values = 17:19) +
  theme_classic()

 

Task 15

A task for the brave! Can you figure out how to change the position of the legend to make the plot look like this?

Hint The Internet is your friend.

Solution

... + theme(legend.position = c(.9, .9))

 

Task 16

Save your lovely plot using ggsave().

Solution

# by default ggsave() saves the last plot
ggsave(filename = "my_rad_scatterplot.png")

# you can assign a plot to an object and save it using the object's name
p <- biling_data %>%
  dplyr::filter(!is.na(gender)) %>%
  ggplot(aes(x = wais, y = BILING)) +
  geom_hline(aes(yintercept = mean(BILING)), lty = 5) +
  geom_vline(aes(xintercept = mean(wais)), lty = 5) +
  geom_point(aes(colour = gender, shape = gender),
             alpha = .7, size = 2) +
  geom_smooth(method = "lm", se = F, colour = "orangered") +
  labs(x = "IQ", y = "Bilingualism score",
       title = "Relationship between intelligence and bilingualism") +
  scale_colour_manual(name = "", values = c("#fac218", "#0d5f8a", "#660a60")) +
  scale_shape_manual(name = "", values = 17:19) +
  theme_classic() +
  theme(legend.position = c(.9, .9))

ggsave(filename = "my_rad_scatterplot.png", plot = p)

Reflect

That was quite a lot of plotting! Let’s pause to think about all the things we’ve covered.

In this session, you:

  • practised creating quick exploratory plots using base R
  • saw how plot() picks different visualisations based on the number and classes of the variables you provide it
  • built a rather advanced grouped scatterplot using ggplot() and
    • geom_point() for adding, well, points.
    • geom_smooth() for adding trend lines
    • geom_hline() and geom_vline() for horizontal and vertical lines, respectively
  • learnt how to customise the appearance of the plot to your (well, mine, really) taste with
    • labs()
    • scale_..._manual()
    • theme()
  • solidified your understanding of principles of ggplot():
    • when to use aes() and when not to
    • plots are composed of layers and the order of layers matters