Sample Size

How many samples?

Most common question

  • depends on sampling
  • depends on design
  • depends on objective

Population Mean

Sample size (n) for estimating the population mean (Thompson Ch. 4, Eqn 3)

\[ n = \frac{1}{d^2/(z^{2}\sigma^2)+(1/N)} \]

  • \(d\) = absolute difference b/w population mean and estimate
  • \(d\) = \(\lvert{\theta- \hat{\theta}}\lvert\)
  • \(N\) = total possible sample units
  • \(\sigma^2\) = population variance
  • \(z\) = upper \(\alpha/2\) of the std. normal dist.
  • \(z_{1-\alpha/2}\) = \(z_{1-0.01/2} = z_{0.99}\) = 2.5758293

Population Mean

Objective: to know the average count of bass with fish lice

  • Common in eutrophic lakes throughout North America
  • Assuming no error in detecting lice

Population Mean

  • N = number of areas in a lake with bass = 200
  • Abs. difference (d) is 10
  • 99% confidence, \(\alpha = 0.01\)
  • \(\sigma = 20\); variation in count across sample units
  N = 200
  d = 10
  alpha  = 0.01
  z = qnorm(1-alpha/2)
  sigma2 = 20^2

  n = 1/(d^2/(z^2*sigma2)+(1/N))
  n
[1] 23.43042

Population Mean

Thompson Ch.4 : “A bothersome aspect of sample size formulas such as these is that they depend on the population variance”.



Me - “A bothersome aspect of sample size formulas such as these is that they depend on the difference (\(d\)); we know more about \(n\) than \(d\)”.

Population Mean

  • You can afford to sample 10 units
  • What is the worst level of difference (\(d\)) for
    • 95% confidence level
    • \(\sigma\) could range from 10 to 100

How would you solve this?

This is a HW question

NHT Paradigm

The null hypothesis testing paradigm is often focused on Type I error (\(\alpha\)), rejecting the null hypothesis when it is actually true.

The Null hypothesis is commonly, \(H_0(\mu_1 = \mu_2)\)

Statistical Power

  • Definition: the probability that a test will reject a false null hypothesis.

    • \(H_0(\mu_1 = \mu_2)\) vs \(H_1(\mu_1 \ne \mu_2)\)
  • The higher the statistical power for a given experiment, the lower the probability of making a Type II error (false negative)

Power Analysis


Type I Error = P(reject \(H_0\) | \(H_0\) is true) = \(\alpha\)

Power = P(reject \(H_0\) | \(H_1\) is true)

Power = 1 - Type II Error

Power = 1 - Pr(False Negative)

Power = 1 - \(\beta\)

Type II Error / Power

Power Analysis

Contributions to statistical power

  • Which statistical test and its assumptions

  • Effective Sample size (simplest case this is \(n\) per group)

  • Reality

    • t-test: difference of means
    • How different these are: \(\mu_{1}\) and \(\mu_{2}\)
    • How variable these are: \(\sigma_{1}\) and \(\sigma_{2}\)

Case Study

Objective: To evaluate the relative use of two types of hummingbird feeders.

Case Study


Null Hypothesis
Mean daily use of each feeder is equal (\(\mu_{1} = \mu_{2}\)).


Alt. Hypothesis
Mean daily use of each feeder is not equal (\(\mu_{1} \neq \mu_{2}\)).


Statistical Test: two-tailed t-test

Case Study

First, we need to define TRUTH

  Group1.Mean <- 100
  Group1.SD <- 20
  
  Group2.Mean <- 120
  Group2.SD <- 20

Case Study (R Code)

library(pwr)
# Wish to test a difference b/w groups 1 and 2
# Want to know if there is a difference in means
 
#Difference in Means
  effect.size <- Group1.Mean-Group2.Mean

#Group st. dev
  group.sd <- sqrt(mean(c(Group1.SD^2,Group2.SD^2)))

#Mean difference divided by group stdev
#How does the numerator and denominator influence this number?  
  d <- effect.size/group.sd

Case Study

power = 0.8

out = pwr.t.test(d=d,power=power,type="two.sample",
                 alternative="two.sided")

#Sample Size Needed for each Group
out$n
[1] 16.71472

Assuming Independence b/w feeders
How do we design for this?

Tradeoff (Power vs N)

Let’s consider multiple levels of power

power = matrix(seq(0.8,0.99,by=0.01))

my.func = function(x){                    
                      pwr.t.test(d=d,power=x,
                                 type="two.sample",
                                 alternative="two.sided"
                                )$n
}

out= apply(power,1, FUN=my.func)

Truth?

These results assume we are correct about…

  Group1.Mean <- 100
  Group1.SD <- 20
  
  Group2.Mean <- 120
  Group2.SD <- 20


What if we are wrong?


How do we evaluate this?

Tradeoffs

Power, Effect Size, N

# Allow group 1 to vary
  Group1.Mean <- seq(10,110,by=5)

#THIS IS THE SAME
  Group1.SD <- 20
  Group2.Mean <- 120
  Group2.SD <- 20
  group.sd <- sqrt(mean(Group1.SD^2,Group2.SD^2))

# Variable effect.size  
  effect.size <- Group1.Mean-Group2.Mean
  d <- effect.size/group.sd

#setup combinations of d and power
  power = seq(0.8,0.99,by=0.01)
  power.d = expand.grid(power,d)
  power.d$Var1 = as.numeric(power.d$Var1)


#make new function and use mapply
my.func = function(x,x2){                    
                      pwr.t.test(d=x2,power=x,
                                 type="two.sample",
                                 alternative="two.sided"
                                )$n
}

# mapply function
  out= mapply(power.d$Var1,power.d$Var2, FUN=my.func)

# unstandardized the effect size back to difference of means
  power.d$Var2=power.d$Var2*group.sd
  
  out2=cbind(power.d,out)
  colnames(out2)=c("power","d","n")

Tradeoffs

Power, Effect Size, N