Main Take-Away: the objective can not be met at the sample sizes investigated when assuming low or high deer density.
I evaluate sampling design trade-offs in estimating total white-tailed deer abundance in the state of Rhode Island, USA. I consider two extremes of deer density - 1 deer/mi\(^2\) and 20 deer/mi\(^2\). For each deer density, I use a random sampling process to choose blocks of 1 mi\(^2\) to conduct a forward-looking infrared-red (FLIR) count of deer. The count is presumed to be done by helicopter with a FLIR camera attachment. Flights are assumed to occur in the winter and at night to increase the heat signature of deer. As such, detection probability within a block is assumed to be one (this however should be evaluated). I evaluate three sample sizes of 10, 20, and 40 blocks. The objective is to find the sample size that minimizes costs while being highly certain (0.90 probability) that the total population estimate is within 10% of the true value. This study design evaluation is done by using design-based sampling and inference.
I considered the sampling frame to include all one square mile blocks of contiguous lands in Rhode Island that have less than 80% ‘development’.
# find values of high development
index=which(RI$Devel_area>0.8)
# drop values of high development and make a new spatial object
RI=RI[-index,]
# Update the id column
RI$Id=1:nrow(RI)
# plot updated map
plot(RI["Devel_area"])
There are no current studies to suggest the true mean deer density or the spatial structure of the Rhode Island deer population. As such, I will simulate two scenarios that consider a low (1 deer/mi\(^2\)) and high mean deer density (20 deer/mi\(^2\)).
# Mean deer per square mile (1 cell)
deer.dens=c(1,20)
# Total expected deer populations
deer.dens*nrow(RI)
## [1] 944 18880
# Simulate deer densities
set.seed(434343)
deer1=rpois(nrow(RI),deer.dens[1])
deer2=rpois(nrow(RI),deer.dens[2])
par(mfrow=c(1,2))
hist(deer1,freq=FALSE, main="True distribution - mean 1 mi^2")
hist(deer2,freq=FALSE, main="True distribution - mean 20 mi^2")
The assumed true total population size for both scenarios are 905 and
18800, respectively. These populations include random spatial variation.
Considering additional spatial structure, such as variation in deer
density by the percentage of development (Devel_area
) could
be a useful exercise.
I will consider sampling both populations with three different samples sizes (10, 20, and 40). For each sample size, I will simulate random samples 1000 times. This will not evaluate all possible combinations of samples for each size, but it will be enough to approximate the sampling distribution. This can be evaluated by looking at the symmetry of the sampling distribution. Highly skewed and non-symmetric sampling distributions will require a higher number of simulations.
sample.sizes = c(10, 20, 40)
n.sim = 4000
The below code is shown to make it clear how exactly the sampling and
estimation is done There are two for loops. One (index z
)
that loops through the number of sample sizes and the other (index
i
) that loops through the the number of simulation
iterations within each sample sample.
The important code is the use of the function grts
that
selects a spatially balanced sample from the areal sampling frame and
the estimation of the mean deer size for each cell using the function
mean
. There is no model used to estimate and predict the
total deer abundanec. Rather, I am using the mean as the estimator.
# Start code timer and Loop over sample size choices
tic("simulation")
for(z in 1:length(sample.sizes)){
# For each sample size, repeat the
# sampling/estimation criteria n.sim times
for(i in 1:n.sim){
set.seed(434343+i+z) #define random number generation
eqprob <- grts(RI, n_base = sample.sizes[z])
y1=eqprob$sites_base$Deer1
y2=eqprob$sites_base$Deer2
est1=mean(y1)
est2=mean(y2)
deer.total.abundance1[z,i]=est1*nrow(RI)
deer.total.abundance2[z,i]=est2*nrow(RI)
#monitor loops
if(i%%10==0) cat("\nz =",z, ", i =", i)
} #End i loop
};toc() #End z loop and End codetimer
First, examining the absolute bias of the estimator, we see that for the low deer density there to be low bias across sample sizes (10, 20, 40): 0.79, -1.39, -1.9, respectively. The relative bias puts these results in proportion to the size of the true population size: 0, 0, 0, making it more clear how little bias there is. Some of this is likely Markov error, such that increasing the number of simulations would drive these values even lower. The results are similar for the high deer density scenario, where the absolute bias is 6.25, 19.57, -15.75, and the relative bias is 0, 0, 0.
Looking at the sampling distributions for low and deer density, we see the range of possible deer population sizes shrink towards the true value as the sample size increases.
The range (min-max) of possible total deer population estimates at the low deer density for sample sizes (10, 20, and 40) are:
## 10 20 40
## min 94 236 401
## max 2171 1558 1440
The same results for the high deer density are:
## 10 20 40
## min 14538 15670 16756
## max 23883 22326 21476
To specifically address the objective of this study, I found that the probability (given the assumed true low deer density and random sampling) of obtaining a single sample at the three different sample sizes (10, 20, and 40) to be substantially lower than the desired probability of 0.90.
#Probability of means being within 10% of truth - low deer density
apply(deer.total.abundance1,1,FUN=function(x){
low=true.total1-true.total1*0.05
upp=true.total1+true.total1*0.05
length(which(x>low & x<upp))/length(x)
})
## [1] 0.13200 0.17725 0.25675
Higher probabilities were found under the high deer density, but still do not meet the goal of 90%:
## [1] 0.51800 0.71250 0.85775
Considering low and high deer densities, I found that the objective of a single estimate of the total population to be within 10% of the true value was not possible to be met at sample sizes of 10, 20, and 40. Specifically, at the low deer density, the highest probability was 0.25 at a sample size of 40. At the high deer density, the highest probability was 0.85 at a sample size of 40. A sample size of 50 may reach the goal at the high deer density, but a much larger sample size will be need if deer densities are low. Lastly, there is very little bias in the estimator at either low or high deer densities.
This report was generated from the R Statistical Software (v4.2.2; R Core Team 2021) using the Markdown language and RStudio. The R packages used are acknowledged below.
Package | Version | Citation |
---|---|---|
base | 4.4.1 | @base |
doParallel | 1.0.17 | @doParallel |
foreach | 1.5.2 | @foreach |
knitr | 1.47 | @knitr2014; @knitr2015; @knitr2024 |
rmarkdown | 2.27 | @rmarkdown2018; @rmarkdown2020; @rmarkdown2024 |
sf | 1.0.16 | @sf2018; @sf2023 |
spsurvey | 5.5.1 | @spsurvey |
tictoc | 1.2.1 | @tictoc |
tidyverse | 2.0.0 | @tidyverse |