Create a report using Markdown that presents code, results, and summarized findings that makes inference to the occurence of the Annamite striped rabbit (Nesolagus timminsi) across multiple protected areas in Vietnam.
This is not real data. However, the basic premise has been adopted from the paper, Getting the big picture: Landscape-scale occupancy patterns of two Annamite endemics among multiple protected areas.
The Annamite striped rabbit is a forest-dwelling lagomorph that was discovered by science in the mid-1990s. Little is known about this species. What has been done indicates that hunting pressure is a primary driver of its distribution. As such, protected areas are paramount to their conservation. However, protected areas vary in their effectiveness in limiting hunting and some protected areas are founded on multi-use, such that hunting is still allowed.
A camera trap study was conducted in the Annamite mountain range of
Vietnam to better understand the distribution of the unique mammal
fauna. A central focus was to learn about the effects of anthropogenic
activity on the distribution of the Annamite striped rabbit. A total of
50 camera traps were deployed at 5 different protected areas. The
cameras were placed systematically with a random starting point to
capture variation in the Euclidean distance between a camera and the
nearest center of human activity (i.e., village/town). A colleague
organized the data, such that for each camera site, the presence (1) or
assumed absence (0) of the rabbit was recorded (occur
); we
will assume there are no false-positives or false-negatives in these
data. The main variables of interest are the protected areas
(PA
) and the distance to human activity
(dist.human
) in meters.
Working with your colleagues, they outline their hypothesis that the main driver of rabbit occurrence is hunting within the protected areas. However, there is no spatial variable or direct measure of hunting pressure throughout the park. Rather, they predict that increasing distance away from human activity will lead to higher rabbit occupancy because people will only travel so far in difficult mountainous terrain. They also hypothesize that this effect will vary by protected area because there is different levels of patrolling and enforcement. But, regardless of protected area, they hypothesize that occurrence at the edge of the protected areas nearest to human activity will be the same because of the similar density of people and that occurrence will be very low.
Fit the data (rabbit.occ.data.csv
) using a single model
that captures the hypothesis of your colleague.
PA
and dist.human
library(ggplot2)
dat = read.csv("rabbit.occ.data.csv")
# data structure and attributes
str(dat)
## 'data.frame': 250 obs. of 5 variables:
## $ X : int 1 2 3 4 5 6 7 8 9 10 ...
## $ PA : chr "PA1" "PA1" "PA1" "PA1" ...
## $ camera : int 1 2 3 4 5 6 7 8 9 10 ...
## $ occur : int 0 0 0 0 0 0 0 0 0 1 ...
## $ dist.human: num 140 188 363 408 504 ...
# frequency of 0's and 1's
table(dat$occur)
##
## 0 1
## 107 143
# how many cameras at each protected area
table(dat$PA)
##
## PA1 PA2 PA3 PA4 PA5
## 50 50 50 50 50
# histogram of distance to human activity
hist(dat$dist.human)
# histograms of human activity measured across PA's
ggplot(dat, aes(x=dist.human))+
geom_histogram(color="black", fill="white")+
facet_grid(PA ~ .)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
To evaluate the hypothesis, I fit a hierarchical binomial regression
model to the rabbit occurrence data. First, to improve model convergence
and maintain the main covariate of interest in terms of an
understandable measure of distance, I scaled the dist.human
variable by 1000 to obtain distance in kilometers. This also allows the
intercept to reflect the logit-value of occurence when
dist.human
is zero, which is a consideration of the
hypothesis.
The model includes a fixed effect intercept (i.e., does not very by
protected area) and a random effect for scaled distance to human
activity (dist.human.sc
) by protected area (i.e., the slope
does vary by protected area). This model structure captures the types of
variation proposed in the hypothesis. I will use \(\alpha = 0.05\) to determine statistical
clarity.
library(glmmTMB)
#scale distances to kilometers
dat$dist.human.sc = dat$dist.human/1000
model = glmmTMB(occur~dist.human.sc+(0+dist.human.sc||PA),
family=binomial(link="logit"),
data=dat
)
summary(model)
## Family: binomial ( logit )
## Formula: occur ~ dist.human.sc + (0 + dist.human.sc || PA)
## Data: dat
##
## AIC BIC logLik deviance df.resid
## 268.5 279.1 -131.3 262.5 247
##
## Random effects:
##
## Conditional model:
## Groups Name Variance Std.Dev.
## PA dist.human.sc 0.1118 0.3343
## Number of obs: 250, groups: PA, 5
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.9236 0.3310 -5.811 6.21e-09 ***
## dist.human.sc 0.9798 0.2057 4.764 1.90e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
confint(model)
## 2.5 % 97.5 % Estimate
## (Intercept) -2.5723889 -1.2747725 -1.9235807
## dist.human.sc 0.5766640 1.3829012 0.9797826
## Std.Dev.dist.human.sc|PA 0.1377511 0.8115265 0.3343481
In our summary outputs, we see that the population mean (across
protected areas) slope is statistically clearly different than zero
(p-value is < \(\alpha\) and 95%
confidence intervals do not include zero) and is positive. As such,
across all protected areas (including unsampled areas), a ‘typical’
protected area is expected to have increasing occurrence of the striped
rabbit the further from centers of human activity. The prediction that
striped rabbit occurrence would be low near human activity is supported
with a predicted occurrence of 0.13 (i.e., plogis(-1.9236)) at a
distance of zero. We also see that there is support for variation in the
effect of dist.human.sc
as the standard deviation of the
protected area slope differences is well above zero at 0.33 and the 95%
confidence intervals do not include zero.
broom.mixed::tidy(model, effects = "ran_vals", conf.int = TRUE)
## # A tibble: 5 × 9
## effect component group level term estimate std.error conf.low conf.high
## <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 ran_vals cond PA PA1 dist.hum… -0.192 0.188 -0.561 0.177
## 2 ran_vals cond PA PA2 dist.hum… 0.0409 0.191 -0.334 0.416
## 3 ran_vals cond PA PA3 dist.hum… -0.349 0.189 -0.720 0.0211
## 4 ran_vals cond PA PA4 dist.hum… 0.533 0.264 0.0159 1.05
## 5 ran_vals cond PA PA5 dist.hum… -0.0941 0.189 -0.465 0.277
Looking at the protected area specific differences from the population-mean (i.e., the random effect values), we see the largest difference of 0.53 for protected area 4, which is statistically clearly different than zero as the 95% confidence intervals do not include 0. No other random effect are statistically clearly different than the population mean. This suggests an increase in power is warranted by sampling more sites at each protected area.
If we combine the population mean effect and the differences by
protected area, we get the mean slope of dist.human.sc
by
protected area. We see that all slopes are positive, indicating that
there is no evidence that increasing distances reduces rabbit
occurence.
fixef(model)[[1]][2]+ranef(model)[[1]]$PA
## dist.human.sc
## PA1 0.7876961
## PA2 1.0206335
## PA3 0.6305017
## PA4 1.5128966
## PA5 0.8857225
Next, lets visualize the probability of rabbit occurrence at a typical site. This is an important result, as it captures the relationship we would expect at protected areas we have not sampled.
marginaleffects::plot_predictions(model,
condition=c("dist.human.sc","PA"),
type="response",
re.form=NA,
vcov=TRUE
)
Next, we see that there is clear variation of the effect of
dist.human.sc
by protected area and that none of the slopes
are negative.
marginaleffects::plot_predictions(model,
condition=c("dist.human.sc","PA"),
type="response",
re.form=NULL,
vcov=TRUE
)
We found support for our hypothesis. There is evidence for variation of the effect of distance to human activity on striped rabbit occurence and that as distances increases, occurrence increases. We are measuring a proxy for hunting pressure, but assuming distance relates to human hunting pressure and not other confounding factors, our hypothesis is supported. A follow up study might want to plan for increasing the sampling at each protected area to better test for differences of the main effect by protected area (i.e., increase the statistical power). Second, we may want to consider a random intercept to evaluate the hypothesis that there is variation of rabbit occurrence near human activity.