Exercise sheet 10

In the tenth small group practice session we will focus on writing for loops and combining them with conditional code evaluation. Our aim is to produce a correlation table similar to this one:

	A1	A2	A3	A4	A5	O1	O2	O3	O4	O5
A1		<.001	<.001	<.001	<.001	.219	.004	<.001	<.001	<.001
A2	$-$.375		<.001	<.001	<.001	<.001	.862	<.001	<.001	<.001
A3	$-$.301	.510		<.001	<.001	<.001	.625	<.001	.020	<.001
A4	$-$.166	.348	.378		<.001	.024	.043	.048	.014	.083
A5	$-$.230	.406	.530	.323		<.001	.406	<.001	.361	.003
O1	$-$.026	.127	.165	.048	.164		<.001	<.001	<.001	<.001
O2	.061	.004	$-$.010	.043	$-$.018	$-$.264		<.001	<.001	<.001
O3	$-$.096	.156	.231	.042	.230	.412	$-$.324		<.001	<.001
O4	$-$.100	.076	.049	$-$.052	.019	.240	$-$.123	.227		<.001
O5	.126	$-$.103	$-$.075	.037	$-$.063	$-$.304	.355	$-$.370	$-$.242

Now, to be straight with you, creating the table above requires neither loops nor conditional commands but it’s a good exercise.

Warm-up

Task 1

Let’s warm up by writing a simple loop.

Task 1.1

Write a skeleton of a loop that iterates 10 times.

Hint

Set an iterator variable (by convention i, j, etc.) to run over a vector of integers from 1 to 10.

Solution

for (i in 1:10) {
  
}

Because there’s no code to iteratively evaluate, this loop doesn’t really do anything other than waste a few CPU cycles.

Task 1.2

Have the loop output the iterator variable by typing it’s name inside of the {}s.

Solution

for (i in 1:10) {
  i
}

As you can see this doesn’t seem to do anything eiter. That’s because loops, just like if(){} else{} doesn’t outpomatically print/show evaluated objects. To get them to do that, you need the print() function.

Task 1.3

Make the code actually print out the iterator for every iteration of the loop.

Hint

Simply wrap the variable inside the {}s in a print() function.

Solution

for (i in 1:10) {
  print(i)
}
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10

Task 1.4

Above your loop, write code that creates a vector vec containing 20 NAs.

Hint

The rep() fuction is handy here.

Solution

vec <- rep(NA, 20)
# printout just to show it worked
vec
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

Task 1.5

Change the code inside the loop to assign the value of i to the i^th element of vec (assuming i is the name of your iterator).

Hint

You will need to index vec using i.

Solution

vec <- rep(NA, 20)
for (i in 1:10) {
  vec[i] <- i
}
vec # see result
 [1]  1  2  3  4  5  6  7  8  9 10 NA NA NA NA NA NA NA NA NA NA

Task 1.6

Let’s say we changed our mind and want the values of i in the last 10 rather than the first 10 places of vec. Can you figure out how to do that?

Hint

Use maths!

Solution

vec <- rep(NA, 20)
for (i in 1:10) {
  vec[i+10] <- i
}
vec
 [1] NA NA NA NA NA NA NA NA NA NA  1  2  3  4  5  6  7  8  9 10

Task 1.7

Can you edit the code so that the loop populates the vec vector like this?

 [1] NA NA NA NA NA NA NA NA NA NA 10  9  8  7  6  5  4  3  2  1

Hint

There are two solutions to this problem.

Solution

vec <- rep(NA, 20)
for (i in 1:10) {
  vec[21-i] <- i
}
vec

# alternatively
vec <- rep(NA, 20)
for (i in 1:10) {
  vec[i + 10] <- 11 - i
}
vec

Task 1.8

Last one before we move on. Can you use the loop to recreate this vector?

 [1] NA 19 NA 17 NA 15 NA 13 NA 11 NA  9 NA  7 NA  5 NA  3 NA  1

Hint

There are two solutions to this problem.

Solution

vec <- rep(NA, 20)
for (i in 1:10) {
  vec[2*i] <- 21- 2* i
}
vec

Main course

OK, let’s make the correlation table with p-values now.

Again, there are several ways of creating such a table; some of them use loops, some of them don’t. But for the sake of practice, let’s zoom in on one of the loopy solutions.

Task 2

Let’s first get our data. To save time, here’s the code that selects only complete cases and only the “agreeableness” and “openness” items from the psych::bfi data set. You can click on the code to copy it.

bfi <- psych::bfi
bfi <- bfi[complete.cases(bfi),
           grep("^A|^O", names(bfi))] # columns beginning with A or O

Task 3

Create a matrix cor_mat containing only NAs that has the required dimensions. We are looking at 10 variables, so we need a 10× 10 matrix.

Hint

Matrices are created using the matrix() function.

Solution

cor_mat <- matrix(NA, nrow = 10, ncol = 10)
cor_mat
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
 [1,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
 [2,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
 [3,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
 [4,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
 [5,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
 [6,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
 [7,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
 [8,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
 [9,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
[10,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA

Task 4

Let’s run a correlation test noe.

Task 4.1

First, write two separate base R commands that subset the first two columns of bfi . Use indexing with column numbers rather than names to get the columns out of the data frame.

Hint

You need to use the [[x]] or the [ , x] notation for subsetting.

Solution

# both work equally well
bfi[[1]] # list notation: first list element (in data.frames, that's a column)
   [1] 6 4 4 4 1 2 4 1 2 2 4 2 1 5 1 1 1 1 1 5 2 1 5 2 1 1 1 1 3 4 1 1 1 3 1 3 2 2 4 2 1 4 2 1 4 4 1 3 2 1 2 1 1 1 1 4 1 1 1 1 1 2 1 2 2 3 1
  [68] 4 2 3 2 1 1 1 2 3 1 1 1 1 4 1 5 2 1 4 2 6 2 2 2 2 2 2 2 3 2 1 1 1 1 2 2 2 1 5 2 1 2 2 1 1 1 2 4 2 3 1 2 2 2 1 2 1 1 2 2 1 3 4 1 2 1 1
 [135] 6 4 3 4 3 4 3 1 1 2 2 1 2 1 2 5 1 2 1 3 2 1 2 2 2 1 2 5 5 1 1 1 1 3 1 2 5 5 3 3 2 1 3 3 4 1 4 3 3 3 3 4 6 3 5 3 5 1 1 4 5 3 6 2 4 3 2
 [202] 1 1 4 3 1 2 3 1 1 4 2 3 2 2 2 3 1 2 1 1 1 2 3 1 4 3 1 1 2 3 4 5 1 3 1 2 1 3 2 3 2 2 2 2 5 4 3 1 1 2 1 4 1 3 1 2 1 2 1 2 1 2 5 2 2 4 2
 [269] 3 2 1 4 3 1 1 1 4 1 4 1 6 1 3 1 2 1 1 1 4 2 2 4 2 5 1 2 4 2 2 1 1 2 1 2 1 3 3 4 4 1 1 3 3 1 3 1 1 3 4 1 3 3 1 1 1 2 1 2 2 2 2 2 2 2 2
 [336] 2 2 4 2 5 1 4 1 2 2 6 1 3 5 1 2 1 2 2 5 2 2 1 3 1 1 2 1 3 2 1 2 2 4 6 2 2 1 1 1 1 2 1 2 1 2 1 1 4 2 4 4 2 4 1 3 1 5 2 1 1 2 1 3 6 2 3
 [403] 2 2 1 1 1 3 4 1 1 6 1 1 2 1 3 1 2 1 1 5 3 3 4 1 1 2 2 2 1 4 1 4 4 1 3 1 2 1 1 2 5 3 1 1 1 1 1 5 2 1 1 3 6 1 3 1 1 2 4 1 3 1 2 2 4 2 6
 [470] 1 1 4 1 1 5 2 3 1 2 3 4 2 3 1 1 1 2 1 2 3 2 1 2 2 4 2 3 3 4 2 4 2 3 1 1 1 2 3 2 1 4 1 1 1 2 2 1 2 4 1 5 1 1 1 3 6 5 1 2 5 4 2 3 2 1 2
 [537] 1 5 1 4 2 2 1 1 3 1 3 2 2 5 1 3 1 1 3 1 1 1 1 2 2 2 2 2 2 3 2 2 2 1 2 2 1 2 1 1 1 2 3 1 1 1 5 3 1 3 2 3 2 3 2 1 2 1 3 2 1 1 2 1 2 4 3
 [604] 1 3 1 1 1 3 5 4 2 3 1 1 2 1 3 4 4 5 2 2 2 2 1 1 4 3 2 4 2 1 5 3 2 1 1 1 2 3 2 1 4 3 3 3 6 6 3 1 2 2 1 2 1 2 1 5 4 1 1 1 2 1 6 3 4 1 3
 [671] 1 1 2 2 5 1 6 4 4 2 2 1 4 3 1 1 6 5 2 1 1 3 2 2 2 4 1 1 3 5 1 2 2 1 2 1 2 1 2 2 2 1 2 1 2 4 2 2 1 1 2 1 2 2 2 4 1 2 3 1 5 3 1 1 2 2 2
 [738] 4 2 4 1 4 3 2 1 2 2 5 1 1 2 1 5 1 6 2 1 2 1 3 4 1 1 2 4 3 3 1 4 1 4 2 2 2 1 6 4 1 2 2 1 3 1 2 3 1 3 1 2 2 4 4 5 1 1 1 1 2 3 5 2 2 3 1
 [805] 1 1 1 1 1 5 5 1 4 2 5 1 3 4 1 2 2 5 3 2 1 1 5 4 4 2 3 3 5 3 6 1 4 1 1 3 1 3 1 2 1 4 2 4 2 1 1 2 2 1 1 2 1 2 2 2 5 1 2 3 2 3 2 4 2 2 3
 [872] 2 2 2 3 2 2 1 2 1 4 3 1 1 1 1 6 1 2 3 2 1 3 1 1 2 4 1 2 4 1 5 1 2 1 1 2 3 4 3 3 4 2 1 1 2 2 3 6 3 1 3 3 3 2 2 2 2 2 3 2 3 3 1 1 2 5 1
 [939] 2 1 1 2 2 4 2 6 2 2 4 2 2 2 1 1 2 2 4 2 2 2 2 1 5 1 2 2 2 5 1 2 2 2 4 2 2 5 2 1 3 5 4 5 1 1 3 3 1 2 1 2 2 1 5 2 1 2 4 2 2 3
 [ reached getOption("max.print") -- omitted 1236 entries ]
bfi[ , 2] # matrtix notation: all rows of the second column
   [1] 6 3 4 5 5 6 5 6 4 5 5 5 5 3 6 4 5 5 5 6 6 6 5 6 5 6 6 5 6 3 6 4 4 4 6 3 3 4 5 6 5 3 3 6 5 4 5 5 5 2 5 5 6 6 6 4 5 6 6 5 5 4 6 6 3 1 4
  [68] 3 6 4 5 4 6 6 6 6 6 6 6 6 2 5 2 5 5 6 5 5 5 4 5 5 5 5 4 5 4 6 6 6 6 3 4 5 6 4 5 6 6 6 5 5 6 5 4 6 5 6 5 4 4 6 6 5 6 4 4 5 4 4 5 6 2 6
 [135] 5 3 4 3 5 2 4 6 5 2 4 6 3 6 3 4 5 5 5 3 5 5 3 5 4 6 4 5 3 5 5 5 6 5 5 6 4 4 5 6 5 6 4 5 5 6 3 5 3 5 5 5 1 5 5 4 6 6 6 4 5 4 6 6 1 5 4
 [202] 5 6 5 5 5 4 5 5 4 6 6 4 6 2 5 4 5 5 6 6 5 5 6 5 5 4 5 5 6 5 5 2 6 4 6 5 6 5 4 5 5 6 2 6 2 4 3 6 5 2 6 4 4 5 6 5 5 4 6 4 5 4 5 5 5 6 5
 [269] 5 3 5 3 4 6 5 6 5 6 6 6 6 4 5 6 5 5 6 5 4 3 6 4 6 6 5 5 2 5 6 6 6 5 5 6 5 4 5 5 3 4 5 4 4 6 6 6 2 5 6 5 5 6 6 6 6 6 6 6 5 5 5 5 5 5 5
 [336] 5 5 5 6 4 4 4 6 6 5 1 6 6 6 6 5 6 5 2 4 5 5 5 6 5 5 5 5 5 6 6 2 4 2 5 6 5 6 6 6 3 6 5 5 4 6 5 6 5 5 3 4 6 5 5 6 6 6 4 6 6 4 5 5 1 5 2
 [403] 4 4 6 5 6 5 6 4 6 6 6 6 4 5 4 4 5 5 5 4 5 4 5 6 6 4 4 5 6 3 6 5 5 6 6 5 5 4 6 5 5 6 5 1 6 6 5 4 5 5 6 5 6 6 6 6 5 6 5 4 6 5 5 5 5 6 2
 [470] 5 6 6 5 6 6 6 6 6 5 4 6 6 4 6 5 6 5 6 6 4 5 6 5 6 6 5 5 5 4 6 3 6 6 3 6 5 5 5 5 5 6 6 4 6 4 5 5 4 4 5 6 6 5 6 6 1 5 6 5 5 6 5 4 6 5 6
 [537] 6 5 5 5 6 5 6 6 6 6 4 4 6 2 6 5 6 5 4 6 6 5 6 3 4 5 5 6 4 1 4 5 6 6 5 6 6 5 6 5 3 6 6 1 6 5 4 5 6 4 4 4 5 5 5 5 6 5 6 5 6 6 5 5 6 5 4
 [604] 5 3 6 5 5 4 6 3 5 3 5 5 5 6 5 6 3 2 5 5 6 5 6 6 4 5 6 5 5 6 5 5 5 5 6 5 4 5 6 6 6 5 5 4 3 2 5 6 5 5 5 5 6 5 6 5 6 4 4 4 6 6 6 5 6 5 5
 [671] 2 6 4 4 4 5 6 6 3 5 5 6 3 4 6 5 6 4 6 6 4 5 6 3 6 5 6 3 5 6 6 3 5 5 4 4 5 4 4 5 5 6 3 5 5 4 5 4 4 6 2 5 5 5 2 4 5 6 4 6 5 4 5 4 6 4 6
 [738] 4 5 5 5 4 5 5 6 6 6 5 6 6 5 5 5 5 6 4 5 5 5 5 6 6 5 3 4 6 5 4 2 6 6 4 5 5 5 5 4 6 4 5 6 5 6 5 6 5 5 5 4 6 6 6 5 6 5 5 6 3 5 5 5 5 5 4
 [805] 5 6 5 6 5 6 3 5 5 6 5 6 4 5 6 5 5 2 5 4 6 6 6 4 4 4 5 4 4 4 1 6 5 5 6 5 6 3 6 5 6 4 5 4 3 6 5 5 6 5 5 3 6 3 4 6 5 6 5 5 4 4 4 2 5 4 5
 [872] 2 4 4 6 5 5 6 5 6 5 4 4 6 6 6 5 5 6 6 5 5 4 6 6 5 5 3 6 3 4 4 1 5 6 4 5 4 5 4 5 5 6 5 5 5 6 6 4 4 5 2 2 3 5 5 5 6 6 3 4 4 4 5 5 5 4 4
 [939] 3 6 6 5 4 2 5 5 5 5 2 5 5 5 6 6 6 5 5 6 5 6 5 6 5 6 6 5 6 5 6 6 4 4 5 5 4 4 4 6 3 5 4 5 5 6 5 6 6 5 5 5 4 5 4 4 5 6 6 5 6 5
 [ reached getOption("max.print") -- omitted 1236 entries ]

Task 4.2

Run a Spearman correlation test, correlating these two columns, and store its output in an object called test.

Hint

The cor.test() function takes a method= argument.

Solution

test <- cor.test(bfi[[1]], bfi[[2]], method = "spearman", exact = F)

The function is telling us that, because there are tied ranks in the data, it cannot compute exact p-values and the values it gives us are computed via the asymptotic t approximation. This is not a cause for concern, it’s simply a message. To get rid of the message, we can explicitly ask for approximate p-values using the exact=FALSE argument.

Task 4.3

Now, write two lines of code: one that gets the value of rho out of test and one that gets the p-value. Save the rho value in the second row of the first column of cor_mat and the p-value in the first row of its second column. WHile you’re at it, make the code round the values to 3 dp.

Hint

Again, this is just matrix subsetting and assignment.

Solution

cor_mat[2, 1] <- round(test$estimate, 3)
cor_mat[1, 2] <- round(test$p.value, 3)
cor_mat
        [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
 [1,]     NA    0   NA   NA   NA   NA   NA   NA   NA    NA
 [2,] -0.375   NA   NA   NA   NA   NA   NA   NA   NA    NA
 [3,]     NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
 [4,]     NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
 [5,]     NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
 [6,]     NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
 [7,]     NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
 [8,]     NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
 [9,]     NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
[10,]     NA   NA   NA   NA   NA   NA   NA   NA   NA    NA

Notice that in all our subsetting, we only used the numbers 1 and 2. These correspond to both the columns in the dataset we are correlating as well as positions in the correlation matrix in which we are storing the extracted values. If we wanted to correlate the first and third columns of bfi, we would store the rho in cor_mat[3, 1] and the p-value in cor_mat[1, 3].

Unfortunately, the cor.test() function can only test a single correlation so we need to populate our matrix one test at a time.

Task 5

OK, let’s populate cor_mat!

Task 5.1

Using the code above, write a for loop that iterates over each column of bfi and correlates if with the 1^st column of bfi, each time saving the test object into test and populating the appropriate cell of the cor_mat matrix. Use i as the iterator variable.

The result of the loop should be this partially filled matrix:

        [,1] [,2] [,3] [,4] [,5]  [,6]  [,7] [,8] [,9] [,10]
 [1,]  0.000    0    0    0    0 0.219 0.004    0    0     0
 [2,] -0.375   NA   NA   NA   NA    NA    NA   NA   NA    NA
 [3,] -0.301   NA   NA   NA   NA    NA    NA   NA   NA    NA
 [4,] -0.166   NA   NA   NA   NA    NA    NA   NA   NA    NA
 [5,] -0.230   NA   NA   NA   NA    NA    NA   NA   NA    NA
 [6,] -0.026   NA   NA   NA   NA    NA    NA   NA   NA    NA
 [7,]  0.061   NA   NA   NA   NA    NA    NA   NA   NA    NA
 [8,] -0.096   NA   NA   NA   NA    NA    NA   NA   NA    NA
 [9,] -0.100   NA   NA   NA   NA    NA    NA   NA   NA    NA
[10,]  0.126   NA   NA   NA   NA    NA    NA   NA   NA    NA

Hint

Apart from being the iterator variable, i can also be used to subset bfi and the rows/columns of cor_mat.

Solution

# let's start with a fresh cor_mat
cor_mat <- matrix(NA, nrow = 10, ncol = 10)

# i might equally well iterate over:
# 1:10
# 1:ncol(bfi)
for (i in seq_along(bfi)) {
  test <- cor.test(bfi[[i]], bfi[[1]], method = "spearman", exact = F)
  cor_mat[i, 1] <- round(test$estimate, 3)
  cor_mat[1, i] <- round(test$p.value, 3)
}

If you look in your GLobal Environment pane, you’ll see that the i variable got created and, because the loop is not completed, it contains the number 10 (10L means 10 as integer). Let’s make use of this variable.

Task 5.2

Edit the loop you just created so that it now uses j as the iterator variable and correlates i^th and jth columns of bfi, storing the results into appropriate cells of cor_mat again. Starting with an empty matrix, this is what we are after:

       [,1]   [,2]   [,3]  [,4]   [,5]   [,6]  [,7]  [,8]   [,9] [,10]
 [1,]    NA     NA     NA    NA     NA     NA    NA    NA     NA 0.000
 [2,]    NA     NA     NA    NA     NA     NA    NA    NA     NA 0.000
 [3,]    NA     NA     NA    NA     NA     NA    NA    NA     NA 0.000
 [4,]    NA     NA     NA    NA     NA     NA    NA    NA     NA 0.083
 [5,]    NA     NA     NA    NA     NA     NA    NA    NA     NA 0.003
 [6,]    NA     NA     NA    NA     NA     NA    NA    NA     NA 0.000
 [7,]    NA     NA     NA    NA     NA     NA    NA    NA     NA 0.000
 [8,]    NA     NA     NA    NA     NA     NA    NA    NA     NA 0.000
 [9,]    NA     NA     NA    NA     NA     NA    NA    NA     NA 0.000
[10,] 0.126 -0.103 -0.075 0.037 -0.063 -0.304 0.355 -0.37 -0.242 0.000

Hint

cor_mat[i, j] should contain the rho, while cor_mat[j, i] should contain the p=value.

Solution

cor_mat <- matrix(NA, nrow = 10, ncol = 10)

# remember, i is still 10 thanks to the loop we ran earlier
for (j in seq_along(bfi)) {
  test <- cor.test(bfi[[i]], bfi[[j]], method = "spearman", exact = F)
  cor_mat[i, j] <- round(test$estimate, 3)
  cor_mat[j, i] <- round(test$p.value, 3)
}

Task 5.3

Now, combine the two loops using nesting. The outer loop should use i and the inner one j as iterator variables. This way, for each value of i, there will be 10 cycles of the j loop. The code inside the inner loop can stay the same while the outer loop does not need any code of its own.

Hint

Nesting loops, just like if statements is perfectly kosher:

for (i in some_vector) {
  for (j in same_or_another_vector) {
    some.code
  }
}

Solution

cor_mat <- matrix(NA, nrow = 10, ncol = 10)

for (i in seq_along(bfi)) {
  for (j in seq_along(bfi)) {
    test <- cor.test(bfi[[i]], bfi[[j]], method = "spearman", exact = F)
    cor_mat[i, j] <- round(test$estimate, 3)
    cor_mat[j, i] <- round(test$p.value, 3)
  }
}

cor_mat
        [,1]   [,2]   [,3]   [,4]   [,5]   [,6]   [,7]   [,8]   [,9] [,10]
 [1,]  0.000  0.000  0.000  0.000  0.000  0.219  0.004  0.000  0.000 0.000
 [2,] -0.375  0.000  0.000  0.000  0.000  0.000  0.862  0.000  0.000 0.000
 [3,] -0.301  0.510  0.000  0.000  0.000  0.000  0.625  0.000  0.020 0.000
 [4,] -0.166  0.348  0.378  0.000  0.000  0.024  0.043  0.048  0.014 0.083
 [5,] -0.230  0.406  0.530  0.323  0.000  0.000  0.406  0.000  0.361 0.003
 [6,] -0.026  0.127  0.165  0.048  0.164  0.000  0.000  0.000  0.000 0.000
 [7,]  0.061  0.004 -0.010  0.043 -0.018 -0.264  0.000  0.000  0.000 0.000
 [8,] -0.096  0.156  0.231  0.042  0.230  0.412 -0.324  0.000  0.000 0.000
 [9,] -0.100  0.076  0.049 -0.052  0.019  0.240 -0.123  0.227  0.000 0.000
[10,]  0.126 -0.103 -0.075  0.037 -0.063 -0.304  0.355 -0.370 -0.242 0.000

As you can see, the diagonal of the matrix contains 0s. This is because when i and j are equal, the code first populates the corresponding cell of cor_mat with the rho and then overwrites it with the p-value.

Task 5.4

Include a conditional statement inside your loop that tells R to only evaluate the code if i and j are not equal. That way, the diagonal cell will not get populated at all.

Solution

cor_mat <- matrix(NA, nrow = 10, ncol = 10)

for (i in seq_along(bfi)) {
  for (j in seq_along(bfi)) {
    if (i != j) {
      test <- cor.test(bfi[[i]], bfi[[j]], method = "spearman", exact = F)
      cor_mat[i, j] <- round(test$estimate, 3)
      cor_mat[j, i] <- round(test$p.value, 3)
    }
  }
}

cor_mat
        [,1]   [,2]   [,3]   [,4]   [,5]   [,6]   [,7]   [,8]   [,9] [,10]
 [1,]     NA  0.000  0.000  0.000  0.000  0.219  0.004  0.000  0.000 0.000
 [2,] -0.375     NA  0.000  0.000  0.000  0.000  0.862  0.000  0.000 0.000
 [3,] -0.301  0.510     NA  0.000  0.000  0.000  0.625  0.000  0.020 0.000
 [4,] -0.166  0.348  0.378     NA  0.000  0.024  0.043  0.048  0.014 0.083
 [5,] -0.230  0.406  0.530  0.323     NA  0.000  0.406  0.000  0.361 0.003
 [6,] -0.026  0.127  0.165  0.048  0.164     NA  0.000  0.000  0.000 0.000
 [7,]  0.061  0.004 -0.010  0.043 -0.018 -0.264     NA  0.000  0.000 0.000
 [8,] -0.096  0.156  0.231  0.042  0.230  0.412 -0.324     NA  0.000 0.000
 [9,] -0.100  0.076  0.049 -0.052  0.019  0.240 -0.123  0.227     NA 0.000
[10,]  0.126 -0.103 -0.075  0.037 -0.063 -0.304  0.355 -0.370 -0.242    NA

Task 5.5

Finally add a condition that populates the p-value cell with a character string "<.001", if the p-value, rounded to 3 dp, is zero. For this, you may need an intermediate step, first saving round(test$p.value, 3) into some object, e.g., pval, and then populating the given cell of cor_mat with either its value or "<.001" based on whether or not it’s zero.

Solution

cor_mat <- matrix(NA, nrow = 10, ncol = 10)

for (i in seq_along(bfi)) {
  for (j in seq_along(bfi)) {
    if (i != j) {
      test <- cor.test(bfi[[i]], bfi[[j]], method = "spearman", exact = F)
      pval <- round(test$p.value, 3)
      cor_mat[i, j] <- round(test$estimate, 3)
      cor_mat[j, i] <- ifelse(pval, pval, "<.001")
    }
  }
}

cor_mat
      [,1]     [,2]     [,3]     [,4]     [,5]     [,6]     [,7]     [,8]    [,9]     [,10]  
 [1,] NA       "<.001"  "<.001"  "<.001"  "<.001"  "0.219"  "0.004"  "<.001" "<.001"  "<.001"
 [2,] "-0.375" NA       "<.001"  "<.001"  "<.001"  "<.001"  "0.862"  "<.001" "<.001"  "<.001"
 [3,] "-0.301" "0.51"   NA       "<.001"  "<.001"  "<.001"  "0.625"  "<.001" "0.02"   "<.001"
 [4,] "-0.166" "0.348"  "0.378"  NA       "<.001"  "0.024"  "0.043"  "0.048" "0.014"  "0.083"
 [5,] "-0.23"  "0.406"  "0.53"   "0.323"  NA       "<.001"  "0.406"  "<.001" "0.361"  "0.003"
 [6,] "-0.026" "0.127"  "0.165"  "0.048"  "0.164"  NA       "<.001"  "<.001" "<.001"  "<.001"
 [7,] "0.061"  "0.004"  "-0.01"  "0.043"  "-0.018" "-0.264" NA       "<.001" "<.001"  "<.001"
 [8,] "-0.096" "0.156"  "0.231"  "0.042"  "0.23"   "0.412"  "-0.324" NA      "<.001"  "<.001"
 [9,] "-0.1"   "0.076"  "0.049"  "-0.052" "0.019"  "0.24"   "-0.123" "0.227" NA       "<.001"
[10,] "0.126"  "-0.103" "-0.075" "0.037"  "-0.063" "-0.304" "0.355"  "-0.37" "-0.242" NA

And there’s our correlation table. You will have probably noticed that this is now a character matrix (hence the quotes). This conversion happened because we entered "<.001" in some of the cells. Since, as you will recall, matrices can only contain elements of the same class, combining character strings with numeric elements will result in the entire matrix being coerced into character. It looks a little odd printed out like this but, since this is the kind of matrix we might be exporting to a paper and not one we would want to use for any further calculations, it does not matter.

You might have also spotted the fact that the cells of our matrix get populated exactly twice (for instance when i = 2 and j = 5 and when it’s the other way around). This means that we are running twice as many iterations that we necessarily need. Here is what the process looks like:

While this is not a huge deal with a small matrix and a fairly simple code inside of the loop, with added complexity, this kind of inefficiency can make a difference.

If you stop to think about it, we don’t really need i and j to iterate over 1:10. It is sufficient for i to iterate over 1:9 and j to be larger than i and 10 at most. This approach populates the matrix more efficiently:

This approach also gets rid of the need to worry about the diagonal elements so it also makes the code simpler, further cutting on processing time. First, let’s time the original solution:

cor_mat <- matrix(NA, nrow = 10, ncol = 10)

iter <- 0 # cycle counter
start <- Sys.time() # stopwatch starts
for (i in seq_along(bfi)) {
  for (j in seq_along(bfi)) {
    if (i != j) {
      test <- cor.test(bfi[[i]], bfi[[j]], method = "spearman", exact = F)
      pval <- round(test$p.value, 3)
      cor_mat[i, j] <- round(test$estimate, 3)
      cor_mat[j, i] <- ifelse(pval, pval, "<.001")
    }
    iter <- iter + 1 # add cycle
  }
}
end <- Sys.time() # stopwatch ends

Number of iterations run: 100

Time taken to compute: 0.075s

And now the streamlined solution (click to copy code):

cor_mat <- matrix(NA, nrow = 10, ncol = 10)

iter <- 0
start <- Sys.time()
for (i in 1:(ncol(bfi) - 1)) {
  for (j in (i + 1):ncol(bfi)) {
    test <- cor.test(bfi[[i]], bfi[[j]], method = "spearman", exact = F)
    pval <- round(test$p.value, 3)
    cor_mat[j, i] <- round(test$estimate, 3) # had to swap i and j around
    cor_mat[i, j] <- ifelse(pval, pval, "<.001")
    iter <- iter + 1
  }
}
end <- Sys.time()
cor_mat

      [,1]     [,2]     [,3]     [,4]     [,5]     [,6]     [,7]     [,8]    [,9]     [,10]  
 [1,] NA       "<.001"  "<.001"  "<.001"  "<.001"  "0.219"  "0.004"  "<.001" "<.001"  "<.001"
 [2,] "-0.375" NA       "<.001"  "<.001"  "<.001"  "<.001"  "0.862"  "<.001" "<.001"  "<.001"
 [3,] "-0.301" "0.51"   NA       "<.001"  "<.001"  "<.001"  "0.625"  "<.001" "0.02"   "<.001"
 [4,] "-0.166" "0.348"  "0.378"  NA       "<.001"  "0.024"  "0.043"  "0.048" "0.014"  "0.083"
 [5,] "-0.23"  "0.406"  "0.53"   "0.323"  NA       "<.001"  "0.406"  "<.001" "0.361"  "0.003"
 [6,] "-0.026" "0.127"  "0.165"  "0.048"  "0.164"  NA       "<.001"  "<.001" "<.001"  "<.001"
 [7,] "0.061"  "0.004"  "-0.01"  "0.043"  "-0.018" "-0.264" NA       "<.001" "<.001"  "<.001"
 [8,] "-0.096" "0.156"  "0.231"  "0.042"  "0.23"   "0.412"  "-0.324" NA      "<.001"  "<.001"
 [9,] "-0.1"   "0.076"  "0.049"  "-0.052" "0.019"  "0.24"   "-0.123" "0.227" NA       "<.001"
[10,] "0.126"  "-0.103" "-0.075" "0.037"  "-0.063" "-0.304" "0.355"  "-0.37" "-0.242" NA

Number of iterations run: 45

Time taken to compute: 0.045s

Efficiency is just nice! Speaking of which, as mentioned above, this task doesn’t really require a for loop. The Hmisc::rcorr() function can be used to get a matrix of correlations and a matrix of p-values. In fact, that is how the table at the beginning of this sheet was made:

temp <- Hmisc::rcorr(as.matrix(bfi), type = "spearman") # output is a list of several things
cor_mat <- round(temp$r, 3) # $r is the correlation matrix
pvals <- round(temp$P, 3) # $P is the matrix of p-values

# replace the upper triangle of cor_mat with the upper triangle of pvals
cor_mat[upper.tri(cor_mat)] <- pvals[upper.tri(pvals)]
## the rest is just cosmetics
# force 3 dp so that 0.1 gets formatted as 0.100
cor_mat <- format(cor_mat, nsmall = 3)
# replace diagonal with empty strings
diag(cor_mat) <- ""
# replace 0 with "<.001
cor_mat[as.numeric(cor_mat) == 0] <- "<.001"
# format - to show as actual minus signs in output file
# at the same time delete leading zero from negative numbers, e.g. -0.020 -> $-$.020
cor_mat <- gsub("^-0", "$-$", cor_mat)
# get rid of leading zero in all other cells
cor_mat <- gsub("^\\s*0", " ", cor_mat)
# make row names appear in bold in output file
row.names(cor_mat) <- paste0("**", row.names(cor_mat), "**")
# format as table for output file
cor_mat %>% knitr::kable(align="r") %>% kableExtra::kable_styling()

	A1	A2	A3	A4	A5	O1	O2	O3	O4	O5
A1		<.001	<.001	<.001	<.001	.219	.004	<.001	<.001	<.001
A2	$-$.375		<.001	<.001	<.001	<.001	.862	<.001	<.001	<.001
A3	$-$.301	.510		<.001	<.001	<.001	.625	<.001	.020	<.001
A4	$-$.166	.348	.378		<.001	.024	.043	.048	.014	.083
A5	$-$.230	.406	.530	.323		<.001	.406	<.001	.361	.003
O1	$-$.026	.127	.165	.048	.164		<.001	<.001	<.001	<.001
O2	.061	.004	$-$.010	.043	$-$.018	$-$.264		<.001	<.001	<.001
O3	$-$.096	.156	.231	.042	.230	.412	$-$.324		<.001	<.001
O4	$-$.100	.076	.049	$-$.052	.019	.240	$-$.123	.227		<.001
O5	.126	$-$.103	$-$.075	.037	$-$.063	$-$.304	.355	$-$.370	$-$.242

However, the exercise was still valuable.

Reflect

In this session you have:

practised writing for loops
deepened your understanding how iteration in R works
realised that iterator variables can be used to subset data structures
exercised your algorithmic thinking by breaking down a relatively complex problem into a series of repeatable steps
hopefully learned to appreciate efficiency in coding.

Well done!

Exercise sheet 10

Introduction to R workshop

Warm-up

Task 1

Task 1.1

Task 1.2

Task 1.3

Task 1.4

Task 1.5

Task 1.6

Task 1.7

Task 1.8

Main course

Task 2

Task 3

Task 4

Task 4.1

Task 4.2

Task 4.3

Task 5

Task 5.1

Task 5.2

Task 5.3

Task 5.4

Task 5.5

Reflect