Last updated: 2020-02-01

Knit directory: Thesis_single_RNA/

This section is the continuation of bootstrap MLE estimates variability exploration. This time we would want to know the MLE bootstrap variability as function of sample size.

For each of the \(k_on\), \(k_off\), \(k_r\) case we generate a dataset of 100,200,500 and 1000 cells and bootstrap the data 500 times to estimate parameters with MLE. Then we take a look at boxplot of MLE estimates and the standart deviation by sample size.
All plot below are in log scale.

  #The function takes k_on,k_off,kr parameters as well as n_cells (number of cells) as input to generate once and then bootstrap it boot_number times. During each iteration of bootstrapping,  calculate MLE estimation for k_on,k_off,kr and store it in i'th row of est matrix
  est=matrix(, nrow = boot_number, ncol = 3)
  for (i in 1:boot_number){
    boot_x = as.matrix(x[sample(nrow(x),nrow(x),replace=TRUE)])

  #Function does bootstrap function for different number of cells (input as a vector) in a loop and stores the result as list of est matrices
  for (i in 1:length(n_cells)){
    MLE[[i]] <- est}

I) Small \(k_{on}\) and \(k_{off}\) case:

In the case of small \(k_{on}\) and \(k_{off}\), as we saw before, parameters are identifiable so the standart deviation is quite small even for small 100 cell dataset and diminish with the increase of sample size. It is true for \(k_{on}\),\(k_{off}\) and \(k_{r}\) as well.

#True k_on=k_off=0.1
x1 <- MLE[[1]]
x2 <- MLE[[2]]
x3 <- MLE[[3]]
x4 <- MLE[[4]]
boxplot(log(x1[,1]),log(x2[,1]),log(x3[,1]),log(x4[,1]),xlab="Number of cells",names=c("100", "200","500", "1000"),main="k_on MLE bootstrap, true k_on=0.1")
abline(h = log(0.1),col="red")

paste("K_on standart deviations:","n=100:",round(sqrt(var(x1[,1],na.rm=TRUE)),3), "; n=200:",round(sqrt(var(x2[,1],na.rm=TRUE)),3), "; n=500:", round(sqrt(var(x3[,1],na.rm=TRUE)),3), "; n=1000:",round(sqrt(var(x4[,1],na.rm=TRUE)),3))
[1] "K_on standart deviations: n=100: 0.027 ; n=200: 0.016 ; n=500: 0.009 ; n=1000: 0.006"
boxplot(log(x1[,2]),log(x2[,2]),log(x3[,2]),log(x4[,2]),xlab="Number of cells",names=c("100", "200","500", "1000"),main="k_off MLE bootstrap, true k_off=0.1")
abline(h = log(0.1),col="red")

paste("k_off standart deviations:", "n=100:", round(sqrt(var(x1[,2],na.rm=TRUE)),3), "; n=200:",round(sqrt(var(x2[,2],na.rm=TRUE)),3), "; n=500:", round(sqrt(var(x3[,2],na.rm=TRUE)),3), "; n=1000: ", round(sqrt(var(x4[,2],na.rm=TRUE)),3))
[1] "k_off standart deviations: n=100: 0.031 ; n=200: 0.02 ; n=500: 0.01 ; n=1000:  0.008"
boxplot(log(x1[,3]),log(x2[,3]),log(x3[,3]),log(x4[,3]),xlab="Number of cells",names=c("100", "200","500", "1000"),main="k_r MLE bootstrap, true k_r=100")
abline(h = log(100),col="red")

paste("k_r standart deviations:", "n=100:", round(sqrt(var(x1[,3],na.rm=TRUE)),3), "; n=200:",round(sqrt(var(x2[,3],na.rm=TRUE)),3), "; n=500:", round(sqrt(var(x3[,3],na.rm=TRUE)),3), "n=1000:", round(sqrt(var(x4[,3],na.rm=TRUE)),3))
[1] "k_r standart deviations: n=100: 2.102 ; n=200: 1.402 ; n=500: 0.77 n=1000: 0.576"

II) Large \(k_{on}\) and \(k_{off}\) case:

For large \(k_{on}\) and \(k_{off}\) case, we see decrease in standart deviation, but the magnitude is quite high.

[1] "K_on standart deviations: n=100: 24.888 ; n=200: 20.778 ; n=500: 26.62 ; n=1000: 23.83"

[1] "k_off standart deviations: n=100: 109.668 ; n=200: 145.762 ; n=500: 89.291 ; n=1000:  126.325"

[1] "k_r standart deviations: n=100: 83.034 ; n=200: 112.558 ; n=500: 50.253 n=1000: 74.187"

III) Large value of \(k_{off}\) and small \(k_{on}\) case:

For small \(k_{on}\) large \(k_{off}\) we observe that as \(k_{on}\) has small deviation and as we might remember from previous vignette, this is the only identifiable parameter in this scenario. Both \(k_{off}\) and \(k_{r}\) were unidentifiable, so we see high standart deviation even with big data size. We also experience some instability in standart deviation - sample size dependance as we sometimes see increase in standart deviation despite the sample size growth.

[1] "K_on standart deviations: n=100: 0.205 ; n=200: 0.132 ; n=500: 0.093 ; n=1000: 0.06"

[1] "k_off standart deviations: n=100: 160.242 ; n=200: 139.272 ; n=500: 57.805 ; n=1000:  12.559"

[1] "k_r standart deviations: n=100: 1328.824 ; n=200: 1055.969 ; n=500: 539.731 n=1000: 103.627"

IV) Small value of \(k_{off}\) and big \(k_{on}\)

Small value if \(k_{off}\) and big \(k_{on}\) scenario has identifiable parameters, so alike with small value \(k_{on}\), \(k_{on}\) case we observe reasonable standart deviation values, especially for bigger dataset as well as decrease in standart deviation size.

[1] "K_on standart deviations: n=100: 20.615 ; n=200: 9.718 ; n=500: 7.171 ; n=1000: 1.664"

[1] "k_off standart deviations: n=100: 72.209 ; n=200: 38.885 ; n=500: 7.987 ; n=1000:  0.257"

[1] "k_r standart deviations: n=100: 80.019 ; n=200: 32.034 ; n=500: 9.794 n=1000: 1.38"

