Data Analysis for the Life Sciences with R: Exercise Solutions

Question 1

What is the average height of the sons (don’t round off)?

Answer

y <- father.son$sheight # son
x <- father.son$fheight # father
mean(y)

## [1] 68.68407

Answer

groups <- split(y, round(x))
mean(groups['71'] %>% unlist())

## [1] 70.54082

Answer

The answer is C: \(Y = a + b^t + \epsilon\). This is because the variable \(t\) is an exponent unlike all the other answer choices.

Answer

The answer is D: Between individual variability: people of the same height vary in their weight.

Let’s look at each answer choice. Let’s first think \(Y\) as son’s height and \(x\) as father’s height. So in this linear model, our goal is to predict son’s height based on father’s height. Choice A (Measurement error: scales are not perfect) seems not wrong. However, if the father’s height is 71 inches, can we gurantee that the son’s height is a certain number with a small measurement variability? This is not true because the son’s range of height can still be wide even if the father’s height is 71 inches. In fact, siblings can have very different heights even if their biological parents are identical. Since choice A only describes the \(\epsilon\) as measurement variability, the description seems inadequate. This explanation also applies to Choice B (Within individual random fluctuations: you don’t weigh the same in the morning as in the afternoon) and C (Round off error introduced by the computer). Therefore, choice D seems to be most appropriate.

Answer

X[25,3]

## [1] 225

Answer

first_column <- 1:10
y <- cbind(x1=first_column, x2=first_column*2, x3=first_column*3, x4=first_column*4, x5=first_column*5)
sum(y[7,])

## [1] 105

Answer

matrix(1:60,20,3) # choice A

##       [,1] [,2] [,3]
##  [1,]    1   21   41
##  [2,]    2   22   42
##  [3,]    3   23   43
##  [4,]    4   24   44
##  [5,]    5   25   45
##  [6,]    6   26   46
##  [7,]    7   27   47
##  [8,]    8   28   48
##  [9,]    9   29   49
## [10,]   10   30   50
## [11,]   11   31   51
## [12,]   12   32   52
## [13,]   13   33   53
## [14,]   14   34   54
## [15,]   15   35   55
## [16,]   16   36   56
## [17,]   17   37   57
## [18,]   18   38   58
## [19,]   19   39   59
## [20,]   20   40   60

matrix(1:60,20,3, byrow=T) # choice B

##       [,1] [,2] [,3]
##  [1,]    1    2    3
##  [2,]    4    5    6
##  [3,]    7    8    9
##  [4,]   10   11   12
##  [5,]   13   14   15
##  [6,]   16   17   18
##  [7,]   19   20   21
##  [8,]   22   23   24
##  [9,]   25   26   27
## [10,]   28   29   30
## [11,]   31   32   33
## [12,]   34   35   36
## [13,]   37   38   39
## [14,]   40   41   42
## [15,]   43   44   45
## [16,]   46   47   48
## [17,]   49   50   51
## [18,]   52   53   54
## [19,]   55   56   57
## [20,]   58   59   60

x = 11:20; rbind(x,2*x,3*x) # choice C

##   [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## x   11   12   13   14   15   16   17   18   19    20
##     22   24   26   28   30   32   34   36   38    40
##     33   36   39   42   45   48   51   54   57    60

x = 1:40; matrix(3*x,20,2) # choice D

##       [,1] [,2]
##  [1,]    3   63
##  [2,]    6   66
##  [3,]    9   69
##  [4,]   12   72
##  [5,]   15   75
##  [6,]   18   78
##  [7,]   21   81
##  [8,]   24   84
##  [9,]   27   87
## [10,]   30   90
## [11,]   33   93
## [12,]   36   96
## [13,]   39   99
## [14,]   42  102
## [15,]   45  105
## [16,]   48  108
## [17,]   51  111
## [18,]   54  114
## [19,]   57  117
## [20,]   60  120

The answer is B.

Answer

head(t(t(X))) # A

##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,]    1  101  201  301  401  501  601  701  801   901
## [2,]    2  102  202  302  402  502  602  702  802   902
## [3,]    3  103  203  303  403  503  603  703  803   903
## [4,]    4  104  204  304  404  504  604  704  804   904
## [5,]    5  105  205  305  405  505  605  705  805   905
## [6,]    6  106  206  306  406  506  606  706  806   906

head(X %*% matrix(1,ncol(X))) # B

##      [,1]
## [1,] 4510
## [2,] 4520
## [3,] 4530
## [4,] 4540
## [5,] 4550
## [6,] 4560

head(X*1) # C

##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,]    1  101  201  301  401  501  601  701  801   901
## [2,]    2  102  202  302  402  502  602  702  802   902
## [3,]    3  103  203  303  403  503  603  703  803   903
## [4,]    4  104  204  304  404  504  604  704  804   904
## [5,]    5  105  205  305  405  505  605  705  805   905
## [6,]    6  106  206  306  406  506  606  706  806   906

head(X %*% diag(ncol(X))) # D

##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,]    1  101  201  301  401  501  601  701  801   901
## [2,]    2  102  202  302  402  502  602  702  802   902
## [3,]    3  103  203  303  403  503  603  703  803   903
## [4,]    4  104  204  304  404  504  604  704  804   904
## [5,]    5  105  205  305  405  505  605  705  805   905
## [6,]    6  106  206  306  406  506  606  706  806   906

The answer is B.

In choice A, the trasnposed matrix \(X\) gets transposed again, thereby returning to its original matrix \(X\). In choice B, the matrix \(X\) gets multiplied by 1, which is a scalar. So it will not be changed. In choice D, \(X\) gets multiplied by an identity matrix, so \(X\) does not change even after the matrix multiplication. Therefore, the answer is B.

Answer

X <- matrix(c(3,4,-5,1,2,2,2,-1,1,-1,5,-5,5,0,0,1),4,4,byrow=T)
ans <- solve(X) %*% matrix(c(10,5,7,4),4,1)
ans[3]

## [1] -0.8849558

Answer

c <- a %*% b
c[3,2]

## [1] 113

Answer

sum(a[3,] * b[,2])

## [1] 113

Answer

X <- matrix(c(1,1,1,1,0,0,1,1),nrow=4)
rownames(X) <- c('a','a','b','b')
#beta <- c(5,2)
beta <- matrix(c(5,2),nrow =2, ncol=1) # matrix form of the vector beta
X[1:2,] %*% beta

##   [,1]
## a    5
## a    5

When I perform matrix multiplication in R %*%, I usually make sure that all my vectors are converted into matrix. In this case, I rewrote the beta variable in the function of matrix. However, this is not necessary. The vector form of beta beta <- c(5,2) works well too.

Answer

X[3:4,] %*% beta

##   [,1]
## b    7
## b    7

Answer

beta <- matrix(c(10,3,-3),nrow = 3)
X[3:4,] %*% beta

##   [,1]
## b   13
## b   13

Answer

X[5:6,] %*% beta

##   [,1]
## c    7
## c    7

Question 2

One of the defining features of regression is that we stratify one variable based on others. In statistics, we use the verb ‘condition.’ For exmaple,, the linear model for son and father heights answers the question: how tall do I expect a son to be if I condition on his father being \(x\) inches? The regression line answers this question for any \(x\).

Using the father.son dataset described above, we want to know the expected height of sons, if we condition on the father being 71 inches. Create a list of son heights for sons that have fathers with heights of 71 inches, rounding to the nearest inch.

What is the mean of the son heights for fathers that have a height of 71 inches (don’t round off your answer)? Hint: use the function round on the fathers’ heights.

Question 3

Which of the following cannot be written as a linear model?

Question 4

Suppose you model the relationship between weight and height across individuals with a linear model. You assume that the height of individuals for a fixed weight \(x\) follows a linear model \(Y = a + bx + \epsilon\). Which of the following do you feel best describes what \(\epsilon\) represents?

Question 5

In R we have vectors and matrices. You can create your own vectors with the function c.

c(1,5,3,4)

## [1] 1 5 3 4

They are also the output of many functions such as:

rnorm(10)

##  [1] -0.8043316 -1.0565257 -1.0353958 -1.1855604 -0.5004395
##  [6] -0.5249887 -0.3024330  0.4719681 -0.2483839  1.2593180

You can turn vectors into matrcies using functions such as rbind, cbind, or matrix. Create the matrix from the vector 1:1000 like this:

X = matrix(1:1000,100,10)

What is the entry in row 25, column 3?

Question 6

Using the function cbind, create a 10 x 5 matrix with first column x=1:10. Then add 2*x, 3*x, 4*x and 5*x to columns 2 through 5. What is the sum of the elements of the 7th row?

Question 7

Which of the following creates a matrix with multiples of 3 in the third column?

Question 8

Suppose \(X\) is a matrix in R. Which of the following is not equivalent to \(X\)?

Question 9

Solve the following system of equations using R: \[ \begin{align*} 3a + 4b - 5c + d=10\\ 2a + 2b + 2c - d = 5\\ a - b + 5c - 5d = 7 \\ 5a + 5d = 4 \end{align*} \] What is the solution for \(c\)?

Question 10

Load the following two matrices into R:

a <- matrix(1:12, nrow=4)
b <- matrix(1:15, nrow=3)

What is the value in the 3rd row and the 2nd column of the matrix product of a and b?

Question 11

Multiply the 3rd row of a with the 2nd column of b, using the element-wise vector multiplication with *. What is the sum of the elements in the resulting vector?

Question 12

Suppose we are analyzing a set of 4 samples. The first two samples are from a treatment group A and the second two samples are from a treatment group B. This design can be represented with a model matrix like so:

X <- matrix(c(1,1,1,1,0,0,1,1),nrow=4)
rownames(X) <- c('a','a','b','b')
X

##   [,1] [,2]
## a    1    0
## a    1    0
## b    1    1
## b    1    1

Suppose that the fitted parameters for a linear model give us:

beta <- c(5,2)

Use the matrix multiplication operator, %*%, in R to answer the following questions: What is the fitted value for the A samples? (The fitted Y values.)

Question 13

What is the fitted value for the B samples? (The fitted Y values.)

Question 14

Suppose now we are comparing two treatments B and C to a control group A, each with two samples. This design can be represented with a model matrix like so:

X <- matrix(c(1,1,1,1,1,1,0,0,1,1,0,0,0,0,0,0,1,1),nrow=6)
rownames(X) <- c("a","a","b","b","c","c")
X

##   [,1] [,2] [,3]
## a    1    0    0
## a    1    0    0
## b    1    1    0
## b    1    1    0
## c    1    0    1
## c    1    0    1

Suppose that the fitted values for the linear model are given by:

beta <- c(10,3,-3)

What is the fitted values for the B sample?

Question 15

What is the fitted values for the C sample?

Data Analysis for the Life Sciences with R: Exercise Solutions

Chapter 4 Matrix Algebra

4.2 Exercises

Question 1

Question 2

Question 3

Question 4

4.6 Exercises

Question 1

Question 2

Question 3

4.8 Exercises

Question 1

Question 2

Question 3

Question 4

4.10 Exercises

Question 1

Question 2

Question 3

Question 4