In this session we will practice conditional code evaluation and fixing malfunctioning code.

if(){} else{}

Let’s start by building an if-else statement.

Create an object x that includes a randomly drawn integer between 1 and 100.

Hint sample() is the easiest function to use for this.

Solution

set.seed(978) # optional
x <- sample(100, 1)

Write code that prints out "You drew an even number" but only if x is even.

Hint Here is when we need to think a little. What makes a number even? Well, a number is even if it is divisible by 2 without a remainder. So, if the remainder of a division by 2 is zero, then we have an odd number. The modulo operator finds the remainder of division:

15 %% 5
[1] 0
17 %% 5
[1] 2

Solution

if (x %% 2 == 0) {
print("You drew an even number.")
}
[1] "You drew an even number."

Add a clause that prints out "You drew an odd number" if the condition we tested for is not met.

Hint You can simply add an else clause.

Solution

if (x %% 2 == 0) {
print("You drew an even number.")
} else {
print("You drew an odd number.")
}
[1] "You drew an even number."

It’s important to always check your code. Try changing x to a few other numbers and re-running the code.

Solution

x <- 19

if (x %% 2 == 0) {
print("You drew an even number.")
} else {
print("You drew an odd number.")
}
[1] "You drew an odd number."

x <- 56

if (x %% 2 == 0) {
print("You drew an even number.")
} else {
print("You drew an odd number.")
}
[1] "You drew an even number."

ifelse()

Rewrite this conditional statement using the ifelse() function.

Hint Unlike if(){} else{}, ifelse() does need the print() function to return the result into the console.

Solution

# simplest solution
ifelse(x %% 2, "You drew an odd number.", "You drew an even number.")
[1] "You drew an even number."

Check the solution to the previous task. Can you explain why the condition works without == 0? Why is the order of the sentences reversed compared to the if(){} else{} solution above?

Solution The first argument to ifelse, just like the condition in if() takes a logical value. If the value passed to it is not logical, R will try to coerce it into such. Numeric values can be successfully coerced into logical values and the conversion follows very simple rules: zero is FALSE, anything else is TRUE.

The second argument is the command to be executed if the first expression evaluates to TRUE and the third argument is the command that gets run if the first expression is FALSE. That means, that if x is even, the expressions x %% 2 will return zero (remainder) which gets interpreted as FALSE. Odd numbers return TRUE, wile even numbers return FALSE which is why, if x is even, it’s the command given to the third argument that gets executed.

You could, of course, also write an if(){} else{} statement using the same condition and the same order of commands.

One advantage of ifelse() is that it’s vectorised, meaning it can work with conditions that evaluate to logical vectors with more than just one element. Sample several numbers into x and run the ifelse() line again.

Solution

x <- sample(100, 5)
x
[1] 37 67 70 40 93
ifelse(x %% 2, "You drew an odd number.", "You drew an even number.")
[1] "You drew an odd number."  "You drew an odd number."  "You drew an even number." "You drew an even number." "You drew an odd number." 

Crash-testing code

When writing code, it’s crucially important to consider edge cases that might yield undesired outcomes. Obviously, assigning a character string, list, or a data frame/tibble to x will break the code:

x <- "some string"
ifelse(x %% 2, "You drew an odd number.", "You drew an even number.")
Error in x%%2: non-numeric argument to binary operator

This is a little bit of a nuisance but at least the code does not produce incorrect results.

There are three cases in which our ifelse() line produces incorrect results. Can you find them?

Hint There are numbers that are neither odd nor even.

Solution

x <- 0
ifelse(x %% 2, "You drew an odd number.", "You drew an even number.")
[1] "You drew an even number."

x <- 3.14
ifelse(x %% 2, "You drew an odd number.", "You drew an even number.")
[1] "You drew an odd number."

x <- TRUE
ifelse(x %% 2, "You drew an odd number.", "You drew an even number.")
[1] "You drew an odd number."

Write code that prints "Input is not numeric." if x contains non-numeric elements but runs our ifelse() command if x is numeric.

Hint You can either nest our ifelse() in another ifelse() or inside if(){} else{}.

Solution

x <- c(T, F, F)
ifelse(is.numeric(x),
ifelse(x %% 2, "You drew an odd number.", "You drew an even number."),
"Input is not numeric.")
[1] "Input is not numeric."

# alternatively
if (is.numeric(x)) {
ifelse(x %% 2, "You drew an odd number.", "You drew an even number.")
} else {
print("Input is not numeric.")
}
[1] "Input is not numeric."

OK, let’s deal with the case when the number we are considering is zero. Edit the code to print "You drew a zero." if that’s the case.

Hint We need another level of nesting of our clauses. Once the is.numeric() test passes, there should be another test to check if x is zero or not.

Solution

x <- c(1, 3, 0, 90)
# This doesn't work quite right
ifelse(
is.numeric(x),
ifelse(
x == 0,
"You drew a zero.",
ifelse(x %% 2, "You drew an odd number.", "You drew an even number.")
),
"Input is not numeric.")
[1] "You drew an odd number."

# This does!
if (is.numeric(x)) {
ifelse(
x == 0,
"You drew a zero.",
ifelse(x %% 2, "You drew an odd number.", "You drew an even number.")
)
} else {
print("Input is not numeric.")
}
[1] "You drew an odd number."  "You drew an odd number."  "You drew a zero."         "You drew an even number."

See the solution to the task above. The first solution will only give us a result for the first element, despite ifelse() being vectorised. Can you figure out what is going on?

Hint The problem is the is.numeric() condition.

Solution The ifelse() function returns a vector of the same length as the vector returned by the test in condition:

# only returns 1, not 1:10
ifelse(TRUE, 1:10, "test returned FALSE")
[1] 1
# now there are 4 TRUEs in condition so function returns first 5 elements of 1:10
ifelse(rep(TRUE, 5), 1:10, "test returned FALSE")
[1] 1 2 3 4 5

If we look at the condition in our solution, we’ll see that it only returns a single logical value:

is.numeric(x)
[1] TRUE

This is because a vector can only contain elements of the same class and so testing for each element is redundant. As a result, the outer ifelse() function only returns one element, even though the middle one returns four:

ifelse(
x == 0,
"You drew a zero.",
ifelse(x %% 2, "You drew an odd number.", "You drew an even number.")
)
[1] "You drew an odd number."  "You drew an odd number."  "You drew a zero."         "You drew an even number."
Now, if(){} else{}, does not have the same restriction as ifelse(). It simply evaluates all the code inside the corresponding code block delineated by {}. That’s why the second solution works better.

The above, of course, does not mean that the task cannot be completed using only ifelse(). All we need is to provide as many TRUEs or FALSEs to the condition as there are elements in x. There are two reliable ways of doing this:

The first one consists in testing each element of x with the is.numeric() function. This approach involves using the apply() family of functions. We have not covered this approach so we only mention it for the sake of completeness. In short, you can take any function and apply it to each element of a vector using the sapply() function:

sapply(x, is.numeric)
[1] TRUE TRUE TRUE TRUE

The second way does not reply on anything we have not covered yet. Can you figure it out?

Hint Remember, if the vector as a whole is of a given class, then all elements are of the came class. Maybe you can just create a vector of desired length that repeats the same result?

Solution

rep(is.numeric(x), length(x))
[1] TRUE TRUE TRUE TRUE

Use the solution to the task above in our nested ifelse() code to see if it makes it work correctly.

Solution

ifelse(
rep(is.numeric(x), length(x)),
ifelse(
x == 0,
"You drew a zero.",
ifelse(x %% 2, "You drew an odd number.", "You drew an even number.")
),
"Input is not numeric.")
[1] "You drew an odd number."  "You drew an odd number."  "You drew a zero."         "You drew an even number."

One more edge case left to deal with. Edit the code so that it prints "You drew a non-integer number." when that is the case.

Hint An integer is divisible by one without a remainder.

Solution

x <- c(x, 42.12) # add a non-integer
ifelse(
rep(is.numeric(x), length(x)),
ifelse(
x == 0,
"You drew a zero.",
ifelse(
x %% 1,
"You drew a non-integer number.",
ifelse(x %% 2, "You drew an odd number.", "You drew an even number.")
)
),
"Input is not numeric.")
[1] "You drew an odd number."        "You drew an odd number."        "You drew a zero."               "You drew an even number."
[5] "You drew a non-integer number."

# Alternative approach
if (is.numeric(x)) {
ifelse(
x == 0,
"You drew a zero.",
ifelse(
x %% 1,
"You drew a non-integer number.",
ifelse(x %% 2, "You drew an odd number.", "You drew an even number.")
)
)
} else {
print("Input is not numeric.")
}
[1] "You drew an odd number."        "You drew an odd number."        "You drew a zero."               "You drew an even number."
[5] "You drew a non-integer number."

if_else()

Let’s conclude by using conditional code evaluation to compute new variables in a dataset.

First of all, let’s create a data set. Copy the code below, run it, and have a quick look at the data.

n <- 50
data <- tibble(
cand_no = sample(100000:999999, n),
test = sample(c("A", "B"), n, replace = T),
marker = rep(c("Lincoln", "Milan"), n/2),
score = round(runif(n, 20, 90))
)

OK, let’s imagine these are the data from 50 students, each took either test A or test B, marked by either Lincoln or Milan. The score variable contains the score on the tests (0-100). Now, as it turns out, test A was slightly more difficult than test B and so it was decided to lower the pass mark from 50/100 to 45/100.

Create a pass_mark column in data that contains the appropriate pass mark for each row.

Hint You can use dplyr::if_else() within dplyr::mutate()

Solution

data %>%
dplyr::mutate(pass_mark = dplyr::if_else(test == "A", 45, 50)) ->
data

# A tibble: 6 x 5
cand_no test  marker  score pass_mark
<int> <chr> <chr>   <dbl>     <dbl>
1  666181 A     Lincoln    73        45
2  748259 A     Milan      43        45
3  749279 B     Lincoln    60        50
4  347797 A     Milan      56        45
5  229369 B     Lincoln    25        50
6  963133 A     Milan      79        45

Add a result column to data that contains either "pass" or "fail" in each row based on whether or not the candidate achieved the pass mark.

Solution

data %>%
dplyr::mutate(result = dplyr::if_else(score < pass_mark, "fail", "pass")) ->
data

# A tibble: 10 x 6
cand_no test  marker  score pass_mark result
<int> <chr> <chr>   <dbl>     <dbl> <chr>
1  666181 A     Lincoln    73        45 pass
2  748259 A     Milan      43        45 fail
3  749279 B     Lincoln    60        50 pass
4  347797 A     Milan      56        45 pass
5  229369 B     Lincoln    25        50 fail
6  963133 A     Milan      79        45 pass
7  266656 B     Lincoln    24        50 fail
8  260610 A     Milan      33        45 fail
9  655708 A     Lincoln    85        45 pass
10  765483 A     Milan      46        45 pass  

Try doing both of the previous steps in one go.

Hint dplyr::if_else() can be nested.

Solution

data %>%
dplyr::mutate(result = dplyr::if_else(
test == "A",
dplyr::if_else(score < 45, "fail", "pass"),
dplyr::if_else(score < 50, "fail", "pass")
)) %>%
# A tibble: 10 x 5
cand_no test  marker  score result
<int> <chr> <chr>   <dbl> <chr>
1  666181 A     Lincoln    73 pass
2  748259 A     Milan      43 fail
3  749279 B     Lincoln    60 pass
4  347797 A     Milan      56 pass
5  229369 B     Lincoln    25 fail
6  963133 A     Milan      79 pass
7  266656 B     Lincoln    24 fail
8  260610 A     Milan      33 fail
9  655708 A     Lincoln    85 pass
10  765483 A     Milan      46 pass  

Now let’s say it turns out that Milan is a bit of a harsh marker so the pass mark for tests marked by him needs to be lowered by 3 marks.

Make sure the result variable reflects this change.

Solution

The code below accomplishes the task with the use of a single mutate() command, albeit at the cost of quite a convoluted if_else() code block:

data %>%
dplyr::mutate(result = dplyr::if_else(
test == "A",
dplyr::if_else(
marker == "Milan",
dplyr::if_else(score < 42, "fail", "pass"),
dplyr::if_else(score < 45, "fail", "pass")
),
dplyr::if_else(
marker == "Milan",
dplyr::if_else(score < 47, "fail", "pass"),
dplyr::if_else(score < 50, "fail", "pass")
)
)) %>%
# A tibble: 10 x 5
cand_no test  marker  score result
<int> <chr> <chr>   <dbl> <chr>
1  666181 A     Lincoln    73 pass
2  748259 A     Milan      43 pass
3  749279 B     Lincoln    60 pass
4  347797 A     Milan      56 pass
5  229369 B     Lincoln    25 fail
6  963133 A     Milan      79 pass
7  266656 B     Lincoln    24 fail
8  260610 A     Milan      33 fail
9  655708 A     Lincoln    85 pass
10  765483 A     Milan      46 pass  

However, it’s probably better to realise that lowering the pass mark is the same as raising the score so with a use of an intermediary mutate(), we can get to the same result with neater, more legible code:

data <- data %>%
dplyr::mutate(
# since score is integer tidyverse requires us to specify that the number we're adding is also an integer (hence the 3L)
marker == "Milan", score + 3L, score
),
result = dplyr::if_else(
test == "A",
dplyr::if_else(adj_score < 45, "fail", "pass"),
dplyr::if_else(adj_score < 50, "fail", "pass")
))

# A tibble: 10 x 6
cand_no test  marker  score adj_score result
<int> <chr> <chr>   <dbl>     <dbl> <chr>
1  666181 A     Lincoln    73        73 pass
2  748259 A     Milan      43        46 pass
3  749279 B     Lincoln    60        60 pass
4  347797 A     Milan      56        59 pass
5  229369 B     Lincoln    25        25 fail
6  963133 A     Milan      79        82 pass
7  266656 B     Lincoln    24        24 fail
8  260610 A     Milan      33        36 fail
9  655708 A     Lincoln    85        85 pass
10  765483 A     Milan      46        49 pass  

Let’s perform a sanity check to make sure our code got everything right.

Use subsetting to make sure all scores on test A marked by Milan above 42 are passes and below 42 are fails.

Hint This is not an if-else exercise. Also, you will need more than one command.

Solution

# passes
milan_A_pass <- data %>%
dplyr::filter(test == "A" & marker == "Milan" & score >= 42) %>%
dplyr::pull(result)
all(milan_A_pass == "pass")
[1] TRUE
# fails
# sometimes, base R can be more flexible
all(data$result[data$test == "A" & data$marker == "Milan" & data$score < 42] == "fail")
[1] TRUE

Check the code performed correctly for all other combinations (Milan_B, Lincoln_A, Lincoln_B).

Hint It might be quicker to create a mock dataset with only edge cases, applying the same data processing and checking the outcome.

Solution

Using mock data:

mock_data <- tibble(
test = rep(c("A", "B"), each = 4),
marker = rep(rep(c("Milan", "Lincoln"), each = 2), 2),
score = c(41, 42, 44, 45, 46, 47, 49, 50),
exp_result = rep(c("fail", "pass"), 4) # what result we're expecting based on scores we created
)

# run the code we wrote using mock_data
mock_data <- mock_data %>%
dplyr::mutate(
marker == "Milan", score + 3L, score
),
result = dplyr::if_else(
test == "A",
dplyr::if_else(adj_score < 45, "fail", "pass"),
dplyr::if_else(adj_score < 50, "fail", "pass")
)
)

# compare expected and observed results
all(mock_data$result == mock_data$exp_result)
[1] TRUE

However, there’s nothing wrong with testing each outcome. It just takes more typing (but less thinking):

# use with() to avoid having to type data\$ again and again
milan_B_pass <- all(with(data, result[test == "B" & marker == "Milan" & score >= 47]) == "fail")
milan_B_fail <- all(with(data, result[test == "B" & marker == "Milan" & score < 47]) == "fail")

# Lincoln
lincoln_A_pass <- all(with(data, result[test == "A" & marker == "Lincoln" & score >= 45]) == "pass")
lincoln_A_fail <- all(with(data, result[test == "A" & marker == "Lincoln" & score < 45]) == "fail")
lincoln_B_pass <- all(with(data, result[test == "B" & marker == "Lincoln" & score >= 50]) == "pass")
lincoln_B_fail <- all(with(data, result[test == "B" & marker == "Lincoln" & score < 50]) == "fail")

# see if all tests passed
all(milan_B_pass,
milan_B_fail,
lincoln_A_pass,
lincoln_A_fail,
lincoln_B_pass,
lincoln_B_fail)
[1] FALSE

Reflect

In completing the tasks in this worksheet, you have:

• practised conditional code evaluation
• gained a better understanding of the respective particularities of if(){} else{} and ifelse()
• learnt how nesting clauses works
• honed your algorithmic thinking
• became aware of the importance of testing code for edge cases that might yield unexpected results
• used conditional evaluation to compute variables in datasets
• further practised sanity-checking your data processing

Well done!