To practice what we just covered, here are a few tasks. We’ll start easy and work our way towards more complex problems. In most cases, there is a hint and the solution to the task available. However, try not to reach for the solution until you are well and truly stuck.
There are probably more tasks here than what can be expected of a smart novice to complete in 30 minutes so if you don’t manage to get them all done before next sessions starts, don’t feel discouraged. Just save the rest for homework :).
Open a new R
Script file.
Typing code in the console, calculate \(\sqrt{\frac{17\times 3}{5.3}} + 100\)
Hint
Some of the basic arithmetic operations in R
are
2*3
10/5
(never \
!)sqrt(2)
2^10
(that’s 210)
Now type the same command in the script and run it from there.
Write an execute a command that calculates the square root of the numbers 9, 100, and 1024, all in one go.
c()
function.
Store the results of each of the three commands you just wrote into objects calc_1
, calc_2
, and calc_3
. If done right, the objects should appear in your Global Environment pane.
<-
.
Solution
Ask R
to print the content of each of these objects.
Write code that takes the square of each element of calc_2
but in a way that DOES NOT overwrite calc_2
. Make sure it worked by running the command.
R
never modifies objects unless you reassign them.
Now modify the line of code so that it DOES overwrite the calc_2
, storing in it the squares of the original values. Once again, double-check that it worked by printing out the contents of the object in the console.
calc_2
.
Let’s say we want to calculate the Body Mass Index (BMI) of these five people:
BMI is calculated as \(\frac{\text{weight in kgs}}{\text{(height in m)}^2}\). Now, we could calculate each individual BMI but that’s cumbersome and gets progressively more so with increasing numbers. Instead, we can use vectorised operations.
Create an object height_m
that stores the heights of our five people.
c()
to combine elements in a vector and <-
to assign the output to an object. Individual elements must be separated by commas
Next, create an object weight_kg
that stores the weights. Make sure you enter the wieghts in the same order you entered the heights. No hints this time!
Finally, apply the BMI formula to our two objects and store the results in a object called bmi
. Then have R
print it out to see the results.
This way, you can just keep adding heights and weights to the respective vectors and then re-run the calculation.
Add a couple of heights and weights of your choice to height_m
and weight_kg
respectively and recalculate bmi
.
Finally, let’s practice some ways of asking things about our data. This is a crucial skill for sanity checking your data and data processing and will come in especially handy in the early stage when you’re still not very confident in what you’re doing.
Without printing calc_1
ask R
how many elements there are inside of it.
length()
of calc_1
?
Let’s say we want to run some checks on our BMI data. To be able to calculate meaningul BMIs, the two objects, height_m
and weight_kg
, must meet several conditions:
None of these are difficult to check with only a handful values in each object by simply eyeballing the data but as datasets get bigger, the ability to offload these kinds of checks onto the computer becomes invaluable.
Ask R
whether or not the respective lengths of height_m
and weight_kg
are equal. Save the output of the command in a new object called length_test
.
x
and y
, use the ==
operator (NOT =
!) - x == y
OK, let’s now check that the two objects only contain numbers.
Use the is.numeric()
function to test whether or not an object is of class numeric
. Let’s test both height_m
and weight_kg
.
Use the logical operator &
to link the two expressions to test both with just one command. This time, store the output in numeric_test
.
Solution
All good! Now, let’s see if the values are reasonable. Here, it’s up to you as the analyst to define what you deem reasonable. The computer can only tell you if your data meet your criteria, not what the criteria should be.
Let’s say one criterion is that the values of height_m
must be smaller than their corresponding values of weight_kg
. Admittedly, it’s not a very good criterion in this context but it might be in different contexts and it also makes for a good exercise so bear with us. Can you figure out how to ask R
if this is true?
Hint
To test for inequality, we have:
x < y
, is x
less than y
?x <= y
, is x
less than or equal to y
?x > y
, is x
greater than y
?x >= y
, is x
greater than or equal to y
?x != y
, are x
and y
NOT equal?
The result of the previous task is a separate test for every element pair. The all()
function takes a logical vector and outputs TRUE
is all its elements are TRUE
and FALSE
otherwise. Use it to see if the condition we’re investigating is met for all value pairs and save the output in comparison_test
.
all()
function.
All looks kosher thus far.
Next let’s explore if the values have reasonable ranges. There are several ways of doing this, each with its pros and cons so let’s have a look at a few.
First of all, we can simply look at the minimum and maximum values of an object. The range()
function returns this information. Let’s have a look at both height_m
and weight_kg
.
This is very useful information but it’s not the best way of sanity-checking our data as it still requires some eyeballing.
Let’s say we think that all values of height_m
should be between 1.2 and 2.3. Can you come up with a one-liner that test for this criterion? If so, save the output of the command in height_range_test
.
height_m
should be larger than 1.2 AND values of height_m
should be smaller than 2.3. This should be true for all elements.
Alternatively, we can ask if the minimum (min()
) of weight_kg
is greater than 40 and at the same time its maximum (can you guess the function?) is less than 250. Try this without hints and save the output of the command in weight_range_test
.
Finally, let’s see if all of our tests returned true.
all()
function returns either TRUE
or FALSE
.
Solution
Great! The values passed our five checks so we can have some confidence that our BMI calculation is meaningful.
First of all, well done! You managed to do quite a lot here and got to practise basic operations, assigning names to objects, and performing basic data-checking test. But on top of all this, you also found out important things about R
.
R
never modifies its inputs. If you want to modify and object you need to reassign using <-
.R
cannot tell you what tests to design but, once you know what you want to test for, it will happily do it for you.