[,1] [,2] [,3] [,4] [,5]
[1,] 253 34 131 168 309
[2,] 259 14 102 110 396
[3,] 326 296 222 261 178
[4,] 253 303 204 255 16
[5,] 355 247 2 13 108
1. Study Objectives, Hypotheses, and Predictions
2. Big Data and Sampling
3. Inference and Prediction
4. Model-Based vs Design-Based Sampling/Inference
Lab: Simulation and Markdown
What you want to accomplish; can have multiple related objectives in a single manuscript.
To understand the space-use of coyotes.

Framing the importance of the objective(s) provides the justification and depends on the audience.
A story that explains how the world works
An explanation for an observed phenomenon
Coyotes have small home ranges in urban areas
“A statement about a phenomenon that also includes the potential mechanism or cause of that phenomenon”. (Betts et al. 2021)
Coyotes have small home ranges in urban areas because food resource density is high
Example:
\[\textbf{y} = \beta_0 + \beta_1 \times \textbf{x} + \mathbf{\epsilon}\] \[\mathbf{\epsilon} \sim \text{Normal}(0, \sigma^2)\]
where…
\(\textbf{y}\) = vector of home range sizes of coyotes
\(\beta_0\) = intercept
\(\beta_1\) = effect diff. of HR size for urban coyotes
\(\textbf{x}\) = indicator of HR in urban (1) or not in urban (0)
\(\sigma^2\) = uncertainty / unknown variability
Example:
\[\textbf{y} = \beta_0 + \beta_1 \times \textbf{x} + \mathbf{\epsilon}\] \[\mathbf{\epsilon} \sim \text{Normal}(0, \sigma^2)\]
\(\beta_1\) is negative and statistically clearly different1 than zero
The expected outcome from a hypothesis. If agrees with data, it would support the hypothesis or at least not reject it.
Descriptive/Naturalist (not hypothetico-deductive)
Hypothetico-Deductive Observational
Hypothetico-Deductive Experimental
Where do you put these?
1. Study Objectives, Hypotheses, and Predictions
“The hidden Biases of Big Data” by Kate Crawford (2013)
“with enough data, the numbers speak for themselves”- Wired Magazine Editor
“The hidden Biases of Big Data” by Kate Crawford (2013)
The Annals of Applied Statistics (2018); Xiao Li Meng,
Using eBird data w/o accounting for sampling biases.
In regard to data and statistical models, 21st century scientists should be pragmatic, excited, and questioning.
the question being asked of the data
how the data came to be
the goal of the question
Ecological surveillance monitoring will often have low quality information regarding post-hoc hypotheses.

1. Study Objectives, Hypotheses, and Predictions
2. Big Data and Sampling
From "To Explain or to Predict" by Galit Shmueli (Statistical Science, 2010):
Explanatory modeling focuses on minimizing (statistical) bias to obtain the most accurate representation of the underlying theory.
Predictive modeling focuses on minimizing both bias and estimation variance; this may sacrifice theoretical accuracy for improved empirical precision.
BUT …
Explanatory models will likely perform better when predicting outside of the sample space and the model has the core underlying processesTrade-Off between prediction accuracy and model interpretability; from James et al. 2013. An Introduction to Statistical Learning
1. Study Objectives, Hypotheses, and Predictions
2. Big Data and Sampling
3. Inference and Prediction
When do we need statistics?

The sample and population are what??


\(\textbf{Y}\) = [\(y_1\),…,\(y_N\)]
The population mean is \(\bar{Y} = \sum_{i=1}^N Y_i / N\) and the sample mean is \(\hat{\bar{y}} = \sum_{i=1}^n y_i / n\)
\(\boldsymbol{y} = \begin{matrix} [y_{1} & y_{2} & y_{3} & y_{4 }]\end{matrix}\)
\(\boldsymbol{y}' = \boldsymbol{y}^{T} = \begin{bmatrix} y_{1} & \\y_{2} &\\ y_{3} & \\y_{4 }\end{bmatrix}\)
Wikipedia: A random variable (also called ‘random quantity’ or ‘stochastic variable’) is a mathematical formalization of a quantity or object which depends on random events.
We observe samples from the domain or population or sampling frame.
Samples are observed with some probability.
[,1] [,2] [,3] [,4] [,5]
[1,] 253 34 131 168 309
[2,] 259 14 102 110 396
[3,] 326 296 222 261 178
[4,] 253 303 204 255 16
[5,] 355 247 2 13 108
Get Every combination and then calcualte the mean for each sample of 10
OR, we can sample enough times to approximate it
Inference relies on …
“a statistical model describing how observations on population units are thought to have been generated from a super‐population with potentially infinitely many observations for each unit;” Williams and Brown, 2019
“The analysis need not account for sampling randomization, because the sample is considered fixed. However, the unit values are considered random.” Williams and Brown, 2019
BUT….
when linking ‘unit values’ in a model, we need to account for their dependence.
Randomization allows us to make conditional independence claims among data in our sample, thus the model is simpler.
\(P(y_{2}|y_{1}) = P(y_{2})\)
\(\textbf{y} \sim\) Poisson(\(\lambda\)) Wikipedia link
\(y_{i} \sim\) Poisson(\(\lambda\))
[1] 500
the difference b/w the true value and the mean of the sampling distribution of all possible values; applies to design- and model-based sampling
[1] 0.00544
[1] 2.72e-05

What is the probability that we will observe a mean within 5% of the truth?
We can calculate this using Monte Carlo integration
1. Study Objectives, Hypotheses, and Predictions
2. Big Data and Sampling
3. Inference and Prediction
4. Model-Based vs Design-Based Inference
Objectives
Introduce R Markdown
Use simulation and design-based sampling to investigate bias and precision
Let’s add some more reality in our work while using design-based sampling in R.
Objective: Evaluate sample size trade-offs for estimating white-tailed deer abundance throughout Rhode Island.
Methodology: Count deer in 1 sq. mile cells using FLIR technology attached to a helicopter.

Steps to consider
Sampling Frame

Steps to consider
“Truth”

Steps to consider
Sampling Process

Steps to consider
Estimation Process
Criteria to Evaluate
