Sampling I

Population and Sample

What is it called to use characteristics of sample to describe a ‘population’?

Design- vs Model-based Infernce

What are important differences according to Thompson and Hankin et al.?

Benefits or costs of either?

Important Considerations

randomness
fixed

Design- vs Model-based Infernce

Design vs. Sampling

Experimental Design:
- Deliberately perturbing a part of a ‘population’ to compare it’s effect to a part that was not perturbed
Sampling Design:
- The process of obtaining a representative sample to characterize a ‘population’ w/o necessarily perturbing it.

Sample Language

A sample (height in inches):

\(\textbf{y} = [69, 54, 72, 61, 58, 71]\)

A sample unit:

\(y_{2} =54\)

Sample size:

\(n = 6\)

Common Sample Units

plots/quadrats - small geographic area to measure/count plants, seeds, insects, etc.
points - measurements are taken from a set of points established thourghout a population
transects - straight-line segments
individual organisms - the ornganism is the sample unit or the organism defines the location of the sample unit

More Language

What is a statistic?

An estimate of a population parameter from a sample

\[\hat{\mu} = \left(\left(\sum_{i=1}^{n}y_{i}\right)\times \frac{1}{n}\right) = 4.1\]

\(n\) is a sample parameter (size of sample)
\(\hat{\mu}\) is an estimate of a population parameter (\(\mu\)) from the estimator (mathematical rule for calculation)
4.1 is a statistic (specific value)

More Language

\[\mu =\left(\sum_{i=1}^{N}y_{i}\right)\times \frac{1}{N} = 4\]

\(\mu\) is a population parameter (measure of central tendency)
\(N\) is a population parameter (size of all possible sample units)
\(4\) is the value of the population parameter

Sampling Error

Sampling Error
- The difference b/w a sample statistic (specific value) and the true value of a population paramter
- 4.1 - 4 = 0.1 sampling error
- Due solely to incomplete enumeration of the population (chance)
- Protection against this is large sample size

Sampling Bias

Sampling Bias
- Systematic tendency of selecting certain sample units; makes the samples unrepresentative to the target population
- Examples in fish/wildlife??

Sampling Variation and Error

Target Population: Weight of all black bears in a region

How would you describe a sampling frame relevant to this target population?

Sampling Variation and Error

Sampling Error

Vector of bear weights

  pop.weights[1:5]

[1] 543.55183  70.43038 343.21377 143.64493 268.94786

# Sample Size
  n = 50

# Sample and estimate mean one time
  sample.1 = mean(
                  sample(pop.weights,
                         n,
                         replace = FALSE
                         )
                  )

# Calculate Sampling Error
  sample.1 - pop.mu.weights

[1] 7.645556

Is this a problem?

Sampling Variation vs Error

Sampling variation is the process and sampling error is an outcome.

The differences between samples (sampling variation) lead to differences between sample statistics and population parameters (sampling error).

Sampling Variation

Calculate many many sample means

# Create function to sample 50 units and take the mean
sample.mean.fn = function(target,n){
                                    mean(
                                         sample(target,n)
                                         )
                                   }
#Repeat the above function 20000 times
  set.seed(54343)
  mu.hat=replicate(20000,
                   sample.mean.fn(pop.weights,n)
                   )

Sampling Variation

Sampling Bias

We should be very interested in the characteristics of the sampling distribution and error.

Expected Bias = average sample mean - population mean

Sampling Variation

We can summarize the sampling variation into a probability

For n = 50, how likely is it that I’ll be within 10% of the true population parameter?

lower=pop.mu.weights - pop.mu.weights*0.10
upper=pop.mu.weights + pop.mu.weights*0.10

length(which(mu.hat>lower & mu.hat<upper)) / length(mu.hat)

[1] 0.97335

Sampling Bias

Sample Population: Weight of harvested black bears in a region that allows food provisioning

Sampling Error and Bias

Sampling Bias

We only sample harvested bears with food supplementation

Expected Bias = 71.26

Sampling Bias

Relative Expected Bias = \(\frac{E(\hat{\mu})-\mu}{\mu}\)

Relative Expected Bias = 0.23

Biased Estimator

We sample all bears but use a different estimator for the population mean

\[ \hat{\mu} = \left(\sum_{i=1}^{n}\frac{(y_{i})^{0.91}}{1.3}\right)\times \frac{1}{n^{1/2}} \]

Biased Estimator

Expected Bias = 215.14

Malicious Sampling Bias

Sample fish in streams.

200 streams to choose to sample. Called what?

Let’s consider a situation where we don’t take an equal probability sample

Probability of sampling an occupied cell

[1] 0.55

[1] 0.45

[1] 1.222222