To practice what we just covered, here are a few tasks. We’ll start easy and work our way towards more complex problems. In most cases, there is a hint and the solution to the task available. However, try not to reach for the solution until you are well and truly stuck.

There are probably more tasks here than what can be expected of a smart novice to complete in 30 minutes so if you don’t manage to get them all done before next sessions starts, don’t feel discouraged. Just save the rest for homework :).

## Basic operations

Open a new R Script file.

Typing code in the console, calculate $$\sqrt{\frac{17\times 3}{5.3}} + 100$$

Hint Some of the basic arithmetic operations in R are

• multiplication: 2*3
• division: 10/5 (never \!)
• square root: sqrt(2)
• exponentiation: 2^10 (that’s 210)

Solution

# brackets around 17 * 3 are optional
sqrt((17 * 3) / 5.3) + 100
[1] 103.102

Now type the same command in the script and run it from there.

Hint Remember that we run commands in the console using ↵ Enter but when running commants from the script you need to press Ctrl + ↵ Enter (Windows) or ⌘ Command + ↵ Enter (Mac OS).

Write an execute a command that calculates the square root of the numbers 9, 100, and 1024, all in one go.

Hint You need to combine the numbers into a single vector and take the square root of that vector. Recall that elements can be combined into a vector using the c() function.

Solution

sqrt(c(9, 100, 1024))
[1]  3 10 32

In one command add the numbers 10, 20, and 30 to the numbers 1, 2, and 3, respectively, to get 11, 22, and 33.

Hint Here, you are adding two vectors together.

Solution

1:3 + c(10, 20, 30)
[1] 11 22 33
# alternatively
# c(1, 2, 3) + c(10, 20, 30)

## Assignment

Store the results of each of the three commands you just wrote into objects calc_1, calc_2, and calc_3. If done right, the objects should appear in your Global Environment pane.

Hint You need to assign each command, one at a time, to the corresponding object using the assignment operator <-.

Solution

calc_1 <- sqrt((17 * 3) / 5.3) + 100
calc_2 <- sqrt(c(9, 100, 1024))
calc_3 <- 1:3 + c(10, 20, 30)

Ask R to print the content of each of these objects.

Hint To print the contents of an object, type its name into the console and press ↵ Enter.

Write code that takes the square of each element of calc_2 but in a way that DOES NOT overwrite calc_2. Make sure it worked by running the command.

Hint R never modifies objects unless you reassign them.

Solution

calc_2^2
[1]    9  100 1024

Now modify the line of code so that it DOES overwrite the calc_2, storing in it the squares of the original values. Once again, double-check that it worked by printing out the contents of the object in the console.

Hint This si where you need to reassign the output of the command to calc_2.

Solution

calc_2 <- calc_2^2
calc_2
[1]    9  100 1024

Let’s say we want to calculate the Body Mass Index (BMI) of these five people:

• Amrita, 1.91 m, 87 kgs
• Bilal, 1.82 m, 91 kgs
• Jia, 1.68 m, 52 kgs
• Josiah, 1.74 m, 64 kgs
• Marios, 1.78 m, 83 kgs

BMI is calculated as $$\frac{\text{weight in kgs}}{\text{(height in m)}^2}$$. Now, we could calculate each individual BMI but that’s cumbersome and gets progressively more so with increasing numbers. Instead, we can use vectorised operations.

Create an object height_m that stores the heights of our five people.

Hint You need c() to combine elements in a vector and <- to assign the output to an object. Individual elements must be separated by commas

Solution

height_m <- c(1.92, 1.82, 1.68, 1.74, 1.78)

Next, create an object weight_kg that stores the weights. Make sure you enter the wieghts in the same order you entered the heights. No hints this time!

Solution

weight_kg <- c(87, 91, 52, 64, 83)

Finally, apply the BMI formula to our two objects and store the results in a object called bmi. Then have R print it out to see the results.

Solution

bmi <- weight_kg / height_m^2
bmi
[1] 23.60026 27.47253 18.42404 21.13886 26.19619

This way, you can just keep adding heights and weights to the respective vectors and then re-run the calculation.

Add a couple of heights and weights of your choice to height_m and weight_kg respectively and recalculate bmi.

Hint Adding an element to an object is the same as combining the object and the value into a single vector and reassigning it back to the object.

Solution

height_m <- c(height_m, 1.79, 1.52)
weight_kg <- c(weight_kg, 79, 50)
bmi <- weight_kg / height_m^2
bmi
[1] 23.60026 27.47253 18.42404 21.13886 26.19619 24.65591 21.64127

## Basic tests

Finally, let’s practice some ways of asking things about our data. This is a crucial skill for sanity checking your data and data processing and will come in especially handy in the early stage when you’re still not very confident in what you’re doing.

While your script should only include commands that impact data processing/visualisation/analysis we recommend you complete the following tasks - especially those that ask you to create new objects - in your script file.

Without printing calc_1 ask R how many elements there are inside of it.

Hint In other words, what is the length() of calc_1?

Solution

length(calc_1)
[1] 1

Let’s say we want to run some checks on our BMI data. To be able to calculate meaningul BMIs, the two objects, height_m and weight_kg, must meet several conditions:

• They must contain the same number of elements.
• They must only contain numbers
• The values must be within reasonable ranges.

None of these are difficult to check with only a handful values in each object by simply eyeballing the data but as datasets get bigger, the ability to offload these kinds of checks onto the computer becomes invaluable.

Ask R whether or not the respective lengths of height_m and weight_kg are equal. Save the output of the command in a new object called length_test.

Hint To test for equality of x and y, use the == operator (NOT =!) - x == y

Solution

length_test <- length(height_m) == length(weight_kg)
length_test
[1] TRUE

OK, let’s now check that the two objects only contain numbers.

Use the is.numeric() function to test whether or not an object is of class numeric. Let’s test both height_m and weight_kg.

Solution

is.numeric(height_m)
[1] TRUE
is.numeric(weight_kg)
[1] TRUE

Use the logical operator & to link the two expressions to test both with just one command. This time, store the output in numeric_test.

Solution

numeric_test <- is.numeric(height_m) & is.numeric(weight_kg)
numeric_test
[1] TRUE

# you can also just bind the two objects together and test the resulting vector
is.numeric(c(height_m, weight_kg))
[1] TRUE

All good! Now, let’s see if the values are reasonable. Here, it’s up to you as the analyst to define what you deem reasonable. The computer can only tell you if your data meet your criteria, not what the criteria should be.

Let’s say one criterion is that the values of height_m must be smaller than their corresponding values of weight_kg. Admittedly, it’s not a very good criterion in this context but it might be in different contexts and it also makes for a good exercise so bear with us. Can you figure out how to ask R if this is true?

Hint To test for inequality, we have:

• x < y, is x less than y?
• x <= y, is x less than or equal to y?
• x > y, is x greater than y?
• x >= y, is x greater than or equal to y?
• x != y, are x and y NOT equal?

Solution

height_m < weight_kg
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE

The result of the previous task is a separate test for every element pair. The all() function takes a logical vector and outputs TRUE is all its elements are TRUE and FALSE otherwise. Use it to see if the condition we’re investigating is met for all value pairs and save the output in comparison_test.

Hint Simply put the command above inside the all() function.

Solution

comparison_test <- all(height_m < weight_kg)
comparison_test
[1] TRUE

All looks kosher thus far.

Next let’s explore if the values have reasonable ranges. There are several ways of doing this, each with its pros and cons so let’s have a look at a few.

First of all, we can simply look at the minimum and maximum values of an object. The range() function returns this information. Let’s have a look at both height_m and weight_kg.

Solution

range(height_m)
[1] 1.52 1.92
range(weight_kg)
[1] 50 91

This is very useful information but it’s not the best way of sanity-checking our data as it still requires some eyeballing.

Let’s say we think that all values of height_m should be between 1.2 and 2.3. Can you come up with a one-liner that test for this criterion? If so, save the output of the command in height_range_test.

Hint Values of height_m should be larger than 1.2 AND values of height_m should be smaller than 2.3. This should be true for all elements.

Solution

height_range_test <- all(height_m > 1.2 & height_m < 2.3)
height_range_test
[1] TRUE

Alternatively, we can ask if the minimum (min()) of weight_kg is greater than 40 and at the same time its maximum (can you guess the function?) is less than 250. Try this without hints and save the output of the command in weight_range_test.

Solution

weight_range_test <- min(weight_kg) > 40 & max(weight_kg) < 250
weight_range_test
[1] TRUE

Finally, let’s see if all of our tests returned true.

Hint Remember that the all() function returns either TRUE or FALSE.

Solution

all(length_test, numeric_test, comparison_test, height_range_test, weight_range_test)
[1] TRUE

Great! The values passed our five checks so we can have some confidence that our BMI calculation is meaningful.

## Reflect

First of all, well done! You managed to do quite a lot here and got to practise basic operations, assigning names to objects, and performing basic data-checking test. But on top of all this, you also found out important things about R.

• Some operations are vectorised, meaning that they can be performed with vectors rather than just with a single value. This means that you can transform variables or calculate new ones and test your data efficiently.
• If you want to treat several values/elements as a whole you must combine them in a vector if they are not already in one.
• A function in R never modifies its inputs. If you want to modify and object you need to reassign using <-.
• There is no “undo” button but you can always start from the beginning and re-run your code.
• Storing objects under names/variables in your environment allows you to conveniently access them.
• There are powerful tools you can use to sanity check your datasets and data-processing. R cannot tell you what tests to design but, once you know what you want to test for, it will happily do it for you.