In this session we will practice conditional code evaluation and fixing malfunctioning code.
if(){} else{}
Let’s start by building an if-else statement.
Create an object x
that includes a randomly drawn integer between 1 and 100.
sample()
is the easiest function to use for this.
Write code that prints out "You drew an even number"
but only if x
is even.
Hint
Here is when we need to think a little. What makes a number even? Well, a number is even if it is divisible by 2 without a remainder. So, if the remainder of a division by 2 is zero, then we have an odd number. The modulo operator finds the remainder of division:
Add a clause that prints out "You drew an odd number"
if the condition we tested for is not met.
else
clause.
Solution
It’s important to always check your code. Try changing x
to a few other numbers and re-running the code.
ifelse()
Rewrite this conditional statement using the ifelse()
function.
if(){} else{}
, ifelse()
does need the print()
function to return the result into the console.
Solution
Check the solution to the previous task. Can you explain why the condition works without == 0
? Why is the order of the sentences reversed compared to the if(){} else{}
solution above?
Solution
The first argument to ifelse
, just like the condition in if()
takes a logical value. If the value passed to it is not logical, R
will try to coerce it into such. Numeric values can be successfully coerced into logical values and the conversion follows very simple rules: zero is FALSE
, anything else is TRUE
.
The second argument is the command to be executed if the first expression evaluates to TRUE
and the third argument is the command that gets run if the first expression is FALSE
. That means, that if x
is even, the expressions x %% 2
will return zero (remainder) which gets interpreted as FALSE
. Odd numbers return TRUE
, wile even numbers return FALSE
which is why, if x
is even, it’s the command given to the third argument that gets executed.
if(){} else{}
statement using the same condition and the same order of commands.
One advantage of ifelse()
is that it’s vectorised, meaning it can work with conditions that evaluate to logical vectors with more than just one element. Sample several numbers into x
and run the ifelse()
line again.
When writing code, it’s crucially important to consider edge cases that might yield undesired outcomes. Obviously, assigning a character string, list, or a data frame/tibble to x
will break the code:
x <- "some string"
ifelse(x %% 2, "You drew an odd number.", "You drew an even number.")
Error in x%%2: non-numeric argument to binary operator
This is a little bit of a nuisance but at least the code does not produce incorrect results.
There are three cases in which our ifelse()
line produces incorrect results. Can you find them?
Solution
x <- 0
ifelse(x %% 2, "You drew an odd number.", "You drew an even number.")
[1] "You drew an even number."
x <- 3.14
ifelse(x %% 2, "You drew an odd number.", "You drew an even number.")
[1] "You drew an odd number."
x <- TRUE
ifelse(x %% 2, "You drew an odd number.", "You drew an even number.")
[1] "You drew an odd number."
Write code that prints "Input is not numeric."
if x
contains non-numeric elements but runs our ifelse()
command if x
is numeric.
ifelse()
in another ifelse()
or inside if(){} else{}
.
Solution
x <- c(T, F, F)
ifelse(is.numeric(x),
ifelse(x %% 2, "You drew an odd number.", "You drew an even number."),
"Input is not numeric.")
[1] "Input is not numeric."
# alternatively
if (is.numeric(x)) {
ifelse(x %% 2, "You drew an odd number.", "You drew an even number.")
} else {
print("Input is not numeric.")
}
[1] "Input is not numeric."
OK, let’s deal with the case when the number we are considering is zero. Edit the code to print "You drew a zero."
if that’s the case.
is.numeric()
test passes, there should be another test to check if x
is zero or not.
Solution
x <- c(1, 3, 0, 90)
# This doesn't work quite right
ifelse(
is.numeric(x),
ifelse(
x == 0,
"You drew a zero.",
ifelse(x %% 2, "You drew an odd number.", "You drew an even number.")
),
"Input is not numeric.")
[1] "You drew an odd number."
# This does!
if (is.numeric(x)) {
ifelse(
x == 0,
"You drew a zero.",
ifelse(x %% 2, "You drew an odd number.", "You drew an even number.")
)
} else {
print("Input is not numeric.")
}
[1] "You drew an odd number." "You drew an odd number." "You drew a zero." "You drew an even number."
See the solution to the task above. The first solution will only give us a result for the first element, despite ifelse()
being vectorised. Can you figure out what is going on?
is.numeric()
condition.
Solution
The ifelse()
function returns a vector of the same length as the vector returned by the test in condition:
# only returns 1, not 1:10
ifelse(TRUE, 1:10, "test returned FALSE")
[1] 1
# now there are 4 TRUEs in condition so function returns first 5 elements of 1:10
ifelse(rep(TRUE, 5), 1:10, "test returned FALSE")
[1] 1 2 3 4 5
If we look at the condition in our solution, we’ll see that it only returns a single logical value:
This is because a vector can only contain elements of the same class and so testing for each element is redundant. As a result, the outer ifelse()
function only returns one element, even though the middle one returns four:
ifelse(
x == 0,
"You drew a zero.",
ifelse(x %% 2, "You drew an odd number.", "You drew an even number.")
)
[1] "You drew an odd number." "You drew an odd number." "You drew a zero." "You drew an even number."
if(){} else{}
, does not have the same restriction as ifelse()
. It simply evaluates all the code inside the corresponding code block delineated by {}
. That’s why the second solution works better.
The above, of course, does not mean that the task cannot be completed using only ifelse()
. All we need is to provide as many TRUE
s or FALSE
s to the condition as there are elements in x
. There are two reliable ways of doing this:
The first one consists in testing each element of x
with the is.numeric()
function. This approach involves using the apply()
family of functions. We have not covered this approach so we only mention it for the sake of completeness. In short, you can take any function and apply it to each element of a vector using the sapply()
function:
The second way does not reply on anything we have not covered yet. Can you figure it out?
Use the solution to the task above in our nested ifelse()
code to see if it makes it work correctly.
Solution
One more edge case left to deal with. Edit the code so that it prints "You drew a non-integer number."
when that is the case.
Solution
x <- c(x, 42.12) # add a non-integer
ifelse(
rep(is.numeric(x), length(x)),
ifelse(
x == 0,
"You drew a zero.",
ifelse(
x %% 1,
"You drew a non-integer number.",
ifelse(x %% 2, "You drew an odd number.", "You drew an even number.")
)
),
"Input is not numeric.")
[1] "You drew an odd number." "You drew an odd number." "You drew a zero." "You drew an even number."
[5] "You drew a non-integer number."
# Alternative approach
if (is.numeric(x)) {
ifelse(
x == 0,
"You drew a zero.",
ifelse(
x %% 1,
"You drew a non-integer number.",
ifelse(x %% 2, "You drew an odd number.", "You drew an even number.")
)
)
} else {
print("Input is not numeric.")
}
[1] "You drew an odd number." "You drew an odd number." "You drew a zero." "You drew an even number."
[5] "You drew a non-integer number."
if_else()
Let’s conclude by using conditional code evaluation to compute new variables in a dataset.
First of all, let’s create a data set. Copy the code below, run it, and have a quick look at the data.
OK, let’s imagine these are the data from 50 students, each took either test A or test B, marked by either Lincoln or Milan. The score
variable contains the score on the tests (0-100). Now, as it turns out, test A was slightly more difficult than test B and so it was decided to lower the pass mark from 50/100 to 45/100.
Create a pass_mark
column in data
that contains the appropriate pass mark for each row.
dplyr::if_else()
within dplyr::mutate()
Solution
data %>%
dplyr::mutate(pass_mark = dplyr::if_else(test == "A", 45, 50)) ->
data
head(data)
# A tibble: 6 x 5
cand_no test marker score pass_mark
<int> <chr> <chr> <dbl> <dbl>
1 666181 A Lincoln 73 45
2 748259 A Milan 43 45
3 749279 B Lincoln 60 50
4 347797 A Milan 56 45
5 229369 B Lincoln 25 50
6 963133 A Milan 79 45
Add a result
column to data
that contains either "pass"
or "fail"
in each row based on whether or not the candidate achieved the pass mark.
Solution
data %>%
dplyr::mutate(result = dplyr::if_else(score < pass_mark, "fail", "pass")) ->
data
head(data, 10)
# A tibble: 10 x 6
cand_no test marker score pass_mark result
<int> <chr> <chr> <dbl> <dbl> <chr>
1 666181 A Lincoln 73 45 pass
2 748259 A Milan 43 45 fail
3 749279 B Lincoln 60 50 pass
4 347797 A Milan 56 45 pass
5 229369 B Lincoln 25 50 fail
6 963133 A Milan 79 45 pass
7 266656 B Lincoln 24 50 fail
8 260610 A Milan 33 45 fail
9 655708 A Lincoln 85 45 pass
10 765483 A Milan 46 45 pass
Try doing both of the previous steps in one go.
dplyr::if_else()
can be nested.
Solution
data %>%
dplyr::mutate(result = dplyr::if_else(
test == "A",
dplyr::if_else(score < 45, "fail", "pass"),
dplyr::if_else(score < 50, "fail", "pass")
)) %>%
head(10)
# A tibble: 10 x 5
cand_no test marker score result
<int> <chr> <chr> <dbl> <chr>
1 666181 A Lincoln 73 pass
2 748259 A Milan 43 fail
3 749279 B Lincoln 60 pass
4 347797 A Milan 56 pass
5 229369 B Lincoln 25 fail
6 963133 A Milan 79 pass
7 266656 B Lincoln 24 fail
8 260610 A Milan 33 fail
9 655708 A Lincoln 85 pass
10 765483 A Milan 46 pass
Now let’s say it turns out that Milan is a bit of a harsh marker so the pass mark for tests marked by him needs to be lowered by 3 marks.
Make sure the result
variable reflects this change.
Solution
The code below accomplishes the task with the use of a single mutate()
command, albeit at the cost of quite a convoluted if_else()
code block:
data %>%
dplyr::mutate(result = dplyr::if_else(
test == "A",
dplyr::if_else(
marker == "Milan",
dplyr::if_else(score < 42, "fail", "pass"),
dplyr::if_else(score < 45, "fail", "pass")
),
dplyr::if_else(
marker == "Milan",
dplyr::if_else(score < 47, "fail", "pass"),
dplyr::if_else(score < 50, "fail", "pass")
)
)) %>%
head(10)
# A tibble: 10 x 5
cand_no test marker score result
<int> <chr> <chr> <dbl> <chr>
1 666181 A Lincoln 73 pass
2 748259 A Milan 43 pass
3 749279 B Lincoln 60 pass
4 347797 A Milan 56 pass
5 229369 B Lincoln 25 fail
6 963133 A Milan 79 pass
7 266656 B Lincoln 24 fail
8 260610 A Milan 33 fail
9 655708 A Lincoln 85 pass
10 765483 A Milan 46 pass
However, it’s probably better to realise that lowering the pass mark is the same as raising the score so with a use of an intermediary mutate()
, we can get to the same result with neater, more legible code:
data <- data %>%
dplyr::mutate(
adj_score = dplyr::if_else(
# since score is integer tidyverse requires us to specify that the number we're adding is also an integer (hence the 3L)
marker == "Milan", score + 3L, score
),
result = dplyr::if_else(
test == "A",
dplyr::if_else(adj_score < 45, "fail", "pass"),
dplyr::if_else(adj_score < 50, "fail", "pass")
))
head(data, 10)
# A tibble: 10 x 6
cand_no test marker score adj_score result
<int> <chr> <chr> <dbl> <dbl> <chr>
1 666181 A Lincoln 73 73 pass
2 748259 A Milan 43 46 pass
3 749279 B Lincoln 60 60 pass
4 347797 A Milan 56 59 pass
5 229369 B Lincoln 25 25 fail
6 963133 A Milan 79 82 pass
7 266656 B Lincoln 24 24 fail
8 260610 A Milan 33 36 fail
9 655708 A Lincoln 85 85 pass
10 765483 A Milan 46 49 pass
Let’s perform a sanity check to make sure our code got everything right.
Use subsetting to make sure all scores on test A marked by Milan above 42 are passes and below 42 are fails.
Solution
# passes
milan_A_pass <- data %>%
dplyr::filter(test == "A" & marker == "Milan" & score >= 42) %>%
dplyr::pull(result)
all(milan_A_pass == "pass")
[1] TRUE
# fails
# sometimes, base R can be more flexible
all(data$result[data$test == "A" & data$marker == "Milan" & data$score < 42] == "fail")
[1] TRUE
Check the code performed correctly for all other combinations (Milan_B, Lincoln_A, Lincoln_B).
Solution
Using mock data:
mock_data <- tibble(
test = rep(c("A", "B"), each = 4),
marker = rep(rep(c("Milan", "Lincoln"), each = 2), 2),
score = c(41, 42, 44, 45, 46, 47, 49, 50),
exp_result = rep(c("fail", "pass"), 4) # what result we're expecting based on scores we created
)
# run the code we wrote using mock_data
mock_data <- mock_data %>%
dplyr::mutate(
adj_score = dplyr::if_else(
marker == "Milan", score + 3L, score
),
result = dplyr::if_else(
test == "A",
dplyr::if_else(adj_score < 45, "fail", "pass"),
dplyr::if_else(adj_score < 50, "fail", "pass")
)
)
# compare expected and observed results
all(mock_data$result == mock_data$exp_result)
[1] TRUE
However, there’s nothing wrong with testing each outcome. It just takes more typing (but less thinking):
# use with() to avoid having to type data$ again and again
milan_B_pass <- all(with(data, result[test == "B" & marker == "Milan" & score >= 47]) == "fail")
milan_B_fail <- all(with(data, result[test == "B" & marker == "Milan" & score < 47]) == "fail")
# Lincoln
lincoln_A_pass <- all(with(data, result[test == "A" & marker == "Lincoln" & score >= 45]) == "pass")
lincoln_A_fail <- all(with(data, result[test == "A" & marker == "Lincoln" & score < 45]) == "fail")
lincoln_B_pass <- all(with(data, result[test == "B" & marker == "Lincoln" & score >= 50]) == "pass")
lincoln_B_fail <- all(with(data, result[test == "B" & marker == "Lincoln" & score < 50]) == "fail")
# see if all tests passed
all(milan_B_pass,
milan_B_fail,
lincoln_A_pass,
lincoln_A_fail,
lincoln_B_pass,
lincoln_B_fail)
[1] FALSE
In completing the tasks in this worksheet, you have:
if(){} else{}
and ifelse()
Well done!