This vignette is associated with the manuscript titled, ‘A plain language review and guidance for modeling animal habitat-selection’. We will be demonstrating some of the points from the manuscript on fitting models to estimate parameters in a movement-based habitat selection function (HSF), which is also known as a step-selection function (SSF)). The scale of interest is the fourth-order of selection (e.g. selection along a track within a home range).
Note: some of the code-chunks are not automatically displayed. To show the code, select ‘show’.
First, load required R packages, source two local functions, and load an object containing spatial covariates.
#Load packages
library(geoR)
library(circular)
library(glmmTMB)
library(raster)
library(sp)
library(survival)
library(Rfast)
library(remotes)
library(plotrix)
library(ggplot2)
library(knitr)
# Install github repository
# remotes::install_github("Pakillo/grateful")
library(grateful)
# Source simulation function
source("./functions/sim.ind.movement.hsf.r")
# Source bootstrapping function
source("./functions/mean_ci_boot.r")
# Load spatial covariates stored in a save R object
load("./data/Covs")
We will consider a habitat selection analysis of individuals along a movement track within a landscape (e.g., fourth-order of selection). Individuals’ effects are assumed to come from a distribution of effects that can be characterized by a mean and standard deviation (i.e., random effect).
# Number of Sampled individuals (e.g., tracked via GPS telemetry)
n.indiv = 30
# Define the true population-level coefficients
# and use these to simulate individual-level coefficients
# Selection against at population-level
# Low variation among individuals
beta1.mu = -1
beta1.sd = 0.2
set.seed(543543)
beta1=rnorm(n.indiv,
mean = beta1.mu,
sd = beta1.sd
)
# No selection at population-level
# Wide variation among individuals
beta2.mu = 0
beta2.sd = 1
set.seed(5435431)
beta2=rnorm(n.indiv,
mean = beta2.mu,
sd = beta2.sd
)
# Selection for this feature at population-level
# Low variation among individuals
beta3.mu = 1
beta3.sd = 0.2
set.seed(1543543)
beta3=rnorm(n.indiv,
mean = beta3.mu,
sd = beta3.sd
)
# Combine coefficients and plot these values
betas=data.frame(b1 = beta1,
b2 = beta2,
b3 = beta3)
par(mfrow=c(3,1))
hist(betas[,1],xlab=bquote(beta[1]),xlim=c(-3,3),main="True Individual Coefficient Values",breaks=10,freq=FALSE)
abline(v=beta1.mu,lwd=2,col=2)
curve(dnorm(x,beta1.mu,beta1.sd),lwd=3,col=3,add=TRUE)
legend("topright",lwd=3,col=c("gray","red","green"),legend=c("Indiv. Coefs", "Pop. Mean","True Distribution"))
hist(betas[,2],xlab=bquote(beta[2]),xlim=c(-3,3),main="",breaks=20,freq=FALSE)
abline(v=beta2.mu,lwd=2,col=2)
curve(dnorm(x,beta2.mu,beta2.sd),lwd=3,col=3,add=TRUE)
legend("topright",lwd=3,col=c("gray","red","green"),legend=c("Indiv. Coefs", "Pop. Mean","True Distribution"))
hist(betas[,3],xlab=bquote(beta[3]),xlim=c(-3,3),main="",breaks=20,freq=FALSE)
abline(v=beta3.mu,lwd=2,col=2)
curve(dnorm(x,beta3.mu,beta3.sd),lwd=3,col=3,add=TRUE)
legend("topright",lwd=3,col=c("gray","red","green"),legend=c("Indiv. Coefs", "Pop. Mean","True Distribution"))
We have loaded spatial covariate data that can be found in the object
covs
. Let’s look at the three covariates we will consider.
The covariates included are the Euclidean distance to nearest
development (dist.dev
), the percent forest cover
(forest
), and the percent shrub cover (shrub
).
Each has been standardized to a mean of 0 and standard deviation of 1;
this is a common procedure in linear modeling to help optimization
algorithms converge to maximum likelihood estimates and to put each
covaraite on the same scale (1 standard deviation of the covariate) so
that coefficient magnitudes can be compared.
# Combine into 1 raster stack
covs = stack(covs[[1]],
covs[[2]],
covs[[3]]
)
# Change the extent to be larger to accommodate more realistic movements
# (not required but makes me feel better)
extent(covs) = c(0,4000,0,4000)
# Names of variables
names(covs[[1]]) = 'dist.dev'
names(covs[[2]]) = 'forest'
names(covs[[3]]) = 'shrub'
par(mfrow=c(2,3))
plot(covs[[1]], main='dist.dev')
plot(covs[[2]], main='forest')
plot(covs[[3]], main='shrub')
hist(values(covs[[1]]), main='dist.dev')
hist(values(covs[[2]]), main='forest')
hist(values(covs[[3]]), main='shrub')
Now, we will use the spatial covariates (covs
) and true
individual-level coefficients (betas
) to create the linear
combinations to create the true individual HSF (as a raster).
hsf.true = vector("list",n.indiv)
for (i in 1:n.indiv){
hsf.true[[i]] = (covs[[1]]*betas[i,1]+
covs[[2]]*betas[i,2]+
covs[[3]]*betas[i,3]
)
}
We are now ready to simulate individual-level data. We will do this
with the sim.ind.movement.hsf
function that is sourced from
the file with the same name.
# Set number of available samples per used
n.avail = 100
# Number of movements
n.time = 400
# Number of possible steps to choose from at each iteration- these are not available locations, this is for the simulation of the movement path
n.step = 400
# Population (across individual) turning angle parameters for von mises distribution
angle.mean = 0
angle.kappa = 0.0001
# Step length rate of exponential distribution
step.rate = 0.01
# Simulate individual-level movement-based habitat selection data
sims = sim.ind.movement.hsf(hsf.true = hsf.true,
n.time = n.time,
n.step = n.step,
n.avail = n.avail,
angle.mean = angle.mean,
angle.kappa = angle.kappa,
step.rate = step.rate
)
Let’s look at one individual’s data. The sims
object has
two elements, one for x-y used locations )(locs
) and one
for the compiled data we will use to fit models (indiv
).
Within each of these lists are lists for each individual, one to
n.indiv
).
names(sims)
## [1] "indiv" "locs"
# Individual 1 data
head(sims$locs[[1]])
## use.x use.y
## 1 2007.386 2009.559
## 2 1970.555 1902.248
## 3 1939.713 2070.911
## 4 2003.961 2089.273
## 5 1964.091 1908.154
## 6 1970.880 1954.061
head(sims$indiv[[1]])
## status strata dist.dev forest shrub step
## 1 1 1 -0.58591604 1.51212061 -0.23357781 12.080123
## 401 0 1 3.06594070 1.90153662 -0.78854177 652.887152
## 402 0 1 0.03532682 0.67724617 -0.09794270 118.846249
## 403 0 1 -0.53975331 1.62440620 0.05898945 33.013549
## 404 0 1 -0.71127578 1.31214563 -0.68346660 2.715337
## 405 0 1 -1.31842318 0.07685809 -1.07986956 244.388450
table(sims$indiv[[1]]$status)
##
## 0 1
## 40000 400
We see from this individual’s data frame that we have the columns:
status, strata, dist.dev, forest, shrub, step. The column
status
is our response variable. A 1
indicates an animal location (used point) and a 0
indicates a potentially used location (available point). Each used
location is paired with 100 available locations and are matched by the
strata number (column strata
). Each 1 and 0 have
corresponding spatial variables, which we have called
dist.dev
, forest
, and shrub
. Each
covariate has been standardized to have a mean of 0 and standard
deviation of 1 (also called Normalizing). Note that we have a large
available sample matched with each used location. This is correct. We
need a large available sample (the 0’s) for each used location to
approximate the true underlying spatio-temporal habitat selection model
(specified via a weighted distribution; see Equation 1 of Michelot et
al. 2024).
Let’s look at one individual’s tracks on top of the true HSF.
#Plot the true HSF with locations for individual k
k=10
par(mfrow=c(1,1))
plot(hsf.true[[k]])
points(sims$locs[[k]]$use.x,
sims$locs[[k]]$use.y,
pch=16
)
We now want to organize each individual’s simulated data into a
single data frame to be able to fit all individuals together. We will
use the objects sims
and sims2
below when we
want to fit models to each individual separately and together,
respectively. For sims2
, we also need a new set of strata
identifiers so that they are unique for each individual.
# Combine data into a single data.frame
sims2 = do.call("rbind", sims$indiv)
head(sims2)
## status strata dist.dev forest shrub step
## 1 1 1 -0.58591604 1.51212061 -0.23357781 12.080123
## 401 0 1 3.06594070 1.90153662 -0.78854177 652.887152
## 402 0 1 0.03532682 0.67724617 -0.09794270 118.846249
## 403 0 1 -0.53975331 1.62440620 0.05898945 33.013549
## 404 0 1 -0.71127578 1.31214563 -0.68346660 2.715337
## 405 0 1 -1.31842318 0.07685809 -1.07986956 244.388450
dim(sims2)
## [1] 1212000 6
# Create ID vector for individual's data
LETTERS702 <- c(LETTERS, sapply(LETTERS, function(x) paste0(x, LETTERS)))
ID = rep(LETTERS702[1:n.indiv],each=nrow(sims$indiv[[1]]))
sims2$ID=ID
# Create new strata
sims2$indiv.id.strata = paste0(sims2$ID,
sims2$strata
)
# The number of unique identifiers needed
# n.step*n.indiv
length(unique(sims2$indiv.id.strata)) == n.step*n.indiv
## [1] TRUE
dim(sims2)
## [1] 1212000 8
head(sims2)
## status strata dist.dev forest shrub step ID
## 1 1 1 -0.58591604 1.51212061 -0.23357781 12.080123 A
## 401 0 1 3.06594070 1.90153662 -0.78854177 652.887152 A
## 402 0 1 0.03532682 0.67724617 -0.09794270 118.846249 A
## 403 0 1 -0.53975331 1.62440620 0.05898945 33.013549 A
## 404 0 1 -0.71127578 1.31214563 -0.68346660 2.715337 A
## 405 0 1 -1.31842318 0.07685809 -1.07986956 244.388450 A
## indiv.id.strata
## 1 A1
## 401 A1
## 402 A1
## 403 A1
## 404 A1
## 405 A1
We are now ready to fit models to estimate our movement-based habitat
selection parameters and demonstrate points made in the manuscript. We
have organized the sections below to match with the sections of the
manuscript.
Definitions only.
Definitions only.
Definitions only.
We want to remind the reader of the nuance between a habitat selection function and the statistical model we will be fitting. The model we are fitting (as coded) and the true underlying model (weighted distribution of spatio-temporal model) is not the habitat selection function. The habitat selection function is a component of the model. We will approximate the true model in two different ways: 1) using a conditional logistic regression model and 2) using a Poisson regression model with stratum-specific fixed intercepts (Muff et al. 2019). Each approach is used to estimate parameters within the movement-based habitat selection function. We will use the same data set to show the equivalence of the estimated parameters.
It is also important to differentiate a habitat selection function from a habitat selection analysis. As mentioned above, a habit selection function is a component of the model we want to fit, while a habitat analysis refers to all the steps that are being considered; this includes, for example, how the data are setup, the model fit, the algorithms used, the inference or predictions from the model, and how the model is evaluated.
Definitions only.
Definitions only.
In our data setup and analysis we will be estimating habitat selection at the fourth-order of the scales defined in Johnson (1980).
This is the hardest part of the whole habitat selection study. How do you decide on the what, where, and how many of sampling to answer the question at hand. Talk to many people. Ask questions to many people.
The modeling mentioned here is about differences between study goals, such as inference and prediction. Modeling for inference versus prediction is not a straightforward distinction. There are lots of opinions from the statistical folks (which are great and everyone should read them, e.g., Shmueli, 2010 and Scholz and Burner, 2022) and the science philosophy folks. We reference the manuscript Gerber and Northrup, 2020 in regards to when the study goal is preditiction. Associated with this manuscript is code in the Supporting Information file (ecy2953-sup-0004-DataS1.Zip) that pertains to optimizing for predicting spatial distributions of habitat selection (i.e., making a map). This process can jeopardize inference, e.g., make the interpretation of estimated effects unreliable. In contrast, if inference is sought you should think hard about a single model that includes the most important variables for the species and for your question that you want to consider so that estimated effects and measures of uncertainty are reliable (Bolker 2023, Tredennick et al. 2021).
The individual’s location data used to fit a movement-based HSF model is often sampled at a high-fix rate, relative to the animal’s speed. We curate our habitat selection data based on the movement process (e.g., step lengths and turning angles). In the above, we have simulated data with spatial locations at a standardized rate of movement and unit of time between movements. It is common for real-world telemetry data to have inconsistent times between consecutive spatial locations. It is important to standardize this time difference, otherwise the movement process parameters won’t be meaningful. For example, a time difference of 30 minutes and 60 minutes are likely to produce different step lengths and thus summarizing what is available on each used location will be compromised.
One way to standardize the time differences is by designing the data setup such that relocations are specified into tracks based on a threshold tolerance of allowable time between consecutive spatial locations (e.g., done in the R package amt; Signer et al. 2019). In other words, resampling the GPS locations at regular intervals. Another way is to fit a continuous time movement model to then predict locations at a standarized interval (e.g., Northrup et al. 2018 and Johnson et al. 2008)
In this section, we discuss some mechanics of inference and prediction, as in interpreting coefficients and predicting relative habitat selection. We will walk through the basics here, but refer the reader to full in depth coverage in Fieberg et al. 2021 and Avgar et al. 2017. We also mention the goal of using habitat selection to predict animal abundance. Demonstrating this is beyond our work. We suggest starting with the reading of section 1.6, “When is species density a reliable reflection of habitat suitability?” of Matthiopoulos, Fieberg, and Aarts, 2023.
Let’s fit a model to one individual’s matched used and available data and discuss the estimated movement-based HSF coefficients. Remember, we have setup an available sample to be paired with a used location to control for the issues of serial dependence or autocorrelation between sequential relocations. This process has also hopefully better defined what is truly accessible to the individual at each location based on their movement dynamics (specifically, turning angle and step-length within a standardized time period). By doing this, we have focused our attention on the dynamic selection of habitat as the animal movemees through the landscape.
# We will use data from individual 1
indiv.data1 = sims$indiv[[1]]
# Let's look at the data to remind us what it is
head(indiv.data1)
## status strata dist.dev forest shrub step
## 1 1 1 -0.58591604 1.51212061 -0.23357781 12.080123
## 401 0 1 3.06594070 1.90153662 -0.78854177 652.887152
## 402 0 1 0.03532682 0.67724617 -0.09794270 118.846249
## 403 0 1 -0.53975331 1.62440620 0.05898945 33.013549
## 404 0 1 -0.71127578 1.31214563 -0.68346660 2.715337
## 405 0 1 -1.31842318 0.07685809 -1.07986956 244.388450
We are now ready to approximate our true model using two difference approaches: the conditional logistic regression model and the Poisson regression model.
First, we can accomplish the first apporach using the
clogit function
in package survival
. We have
related our response variable of the used and available sample
(status
) to our covariates and tell the model how the 1’s
and 0’s are matched with the variable strata
, which is part
of the linear equation but wrapped by an internal function
strata
. Note that the variable combinations are one of an
additive model, as opposed to having interactions.
model1 = clogit(status ~ dist.dev + forest + shrub + strata(strata),
data = indiv.data1
)
# Look at estimated coefficients
summary(model1)
## Call:
## coxph(formula = Surv(rep(1, 40400L), status) ~ dist.dev + forest +
## shrub + strata(strata), data = indiv.data1, method = "exact")
##
## n= 40236, number of events= 400
## (164 observations deleted due to missingness)
##
## coef exp(coef) se(coef) z Pr(>|z|)
## dist.dev -0.78316 0.45696 0.08862 -8.838 < 2e-16 ***
## forest -0.42742 0.65219 0.12521 -3.413 0.000641 ***
## shrub 0.84170 2.32031 0.07845 10.729 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## dist.dev 0.4570 2.188 0.3841 0.5436
## forest 0.6522 1.533 0.5103 0.8336
## shrub 2.3203 0.431 1.9896 2.7060
##
## Concordance= 0.722 (se = 0.011 )
## Likelihood ratio test= 274.9 on 3 df, p=<2e-16
## Wald test = 209 on 3 df, p=<2e-16
## Score (logrank) test = 208.9 on 3 df, p=<2e-16
Let’s first consider the Intercept
. OH, there is no
intercept. That is correct in these models and for our purposes, the
intercepts have no biological meaningful interpretation and thus are not
necessary.
Next, we have estimated the effect of dist.dev
as -0.78.
This estimate is negative, indicating that as the value of
dist.dev
increases from its mean (defined at 0) habitat
selection decreases, given that the values of forest
and
shrub
(the other variables in the model) are held at their
mean (0). In other words, habitat use relative to what is available to
this individual (as we defined it dynamically on a movement track)
decreases the further from development. That’s a lot of qualifiers to
understand this estimate. These are important though. In terms of
evidence of an effect, we can say that at a defined Type I error of
0.05, this is a statistically clear
effect that is not likely zero. The evidence of this is the very small
p-value of 9.772012^{-19}.
Next, we have estimated the effect of forest
as -0.43.
This estimate is negative, indicating that as the value of
forest
increases from its mean (defined at 0) habitat
selection decreases relative to what is available to this individual
(assuming dist.dev
and shrub
are at their
mean). At our defined Type I error, this is a statistically clear effect
that it is not likely zero (p-value = 6.413425^{-4}).
Next, we have estimated the effect of shrub
as 0.84.
This estimate is positive, indicating that as the value of
shrub
increases from its mean (defined at 0) habitat
selection increases relative to what is available to this individual
(assuming dist.dev
and forest
are at their
mean). At our defined Type I error, this is a statistically clear effect
that is not likely zero (p-value = 7.4494248^{-27}).
The above output also conveniently shows us the exponentiation of
each coefficient. This is helpful because exp(coefficient)
“quantifies the relative intensity of use of two locations that differ
by 1 standard deviation unit of [the variable] but are otherwise
equivalent (i.e. they are equally available and have the same values of
all other habitat covariates)” (Fieberg et
al. 2021).
We can estimate the same parameters and get the very same estimates
with other functions that implement the conditional logistic regression
model. For example, the functions fit_clogit
or
fit_ssf
in the amt
package. These are wrapper
functions for using the survival clogit
function. We can
see what the function is doing:
amt::fit_ssf
## function (data, formula, more = NULL, summary_only = FALSE, ...)
## {
## if (!any(grepl(pattern = "^strata\\(.+\\)$", attr(terms(formula),
## "term.labels")))) {
## stop("No strata is provided, please make sure the formula includes a strata.")
## }
## m <- survival::clogit(formula, data = data, ...)
## if (summary_only) {
## m <- list(model = broom::tidy(m), sl_ = attributes(data)$sl_,
## ta_ = attributes(data)$ta_, more = more)
## class(m) <- c("fit_clogit_summary_only", "fit_clogit",
## class(m))
## }
## else {
## m <- list(model = m, sl_ = attributes(data)$sl_, ta_ = attributes(data)$ta_,
## more = more)
## class(m) <- c("fit_clogit", class(m))
## }
## m
## }
## <bytecode: 0x0000018f48eba2a8>
## <environment: namespace:amt>
Now let’s use these functions to estimate the same coefficients as we
did with clogit
:
model1.amt1 = amt::fit_ssf(data = indiv.data1,
formula = status ~ dist.dev + forest +
shrub + strata(strata),
model=TRUE
)
model1.amt2 = amt::fit_clogit(data = indiv.data1,
formula = status ~ dist.dev+forest+
shrub + strata(strata),
model=TRUE
)
coef(model1)
## dist.dev forest shrub
## -0.7831639 -0.4274160 0.8417028
coef(model1.amt1)
## dist.dev forest shrub
## -0.7831639 -0.4274160 0.8417028
coef(model1.amt2)
## dist.dev forest shrub
## -0.7831639 -0.4274160 0.8417028
We can see the estimated coefficients are all the same.
Muff et
al. 2019 demonstrated the equivalence between the conditional
logistic regression model and Poisson regression model with
stratum-specific fixed intercepts. We can fit this model using the
glmmTMB
package, which has computational advantages over
fitting the conditional logistic regression model. The key to this
implementation is that we want to include stratum as part of the linear
combination of variables wrapped in the function strata
to
estimate fixed effect intercepts, or do the same procedure in a random
effect implementation but without shrinkage by fixing the variance to a
large value.
# Fit the model with fixed effect stratum-specific intercepts
model1.tmb = glmmTMB(status ~ dist.dev + forest +
shrub + strata(strata),
data = indiv.data1,
family = poisson
)
# Or using a random effect with fixed variance
model1.tmb2 = glmmTMB(status ~ dist.dev + forest +
shrub + (1| strata),
data = indiv.data1,
family = poisson,
doFit = FALSE
)
# Make the intercept variance large and fixed
model1.tmb2$parameters$theta[1] = log(1e3)
# Tell glmmTMB to not estimate the variance of the intercept
model1.tmb2$mapArg = list(theta=factor(c(NA)))
# Now ready to fit the model
model1.tmb2 = glmmTMB:::fitTMB(model1.tmb2)
Let’s compare the coefficient estimates from these two model fits with one of the clogit fits from above.
Poisson random intercept (fixed variance)
knitr::kable(summary(model1.tmb2)[[6]]$cond[2:4,],digits=3)
Estimate | Std. Error | z value | Pr(>|z|) | |
---|---|---|---|---|
dist.dev | -0.783 | 0.089 | -8.838 | 0.000 |
forest | -0.427 | 0.125 | -3.414 | 0.001 |
shrub | 0.842 | 0.078 | 10.729 | 0.000 |
Poisson Fixed effect
knitr::kable(summary(model1.tmb)[[6]]$cond[2:4,],digits=3)
Estimate | Std. Error | z value | Pr(>|z|) | |
---|---|---|---|---|
dist.dev | -0.783 | 0.089 | -8.838 | 0.000 |
forest | -0.427 | 0.125 | -3.414 | 0.001 |
shrub | 0.842 | 0.078 | 10.729 | 0.000 |
clogit model
knitr::kable(summary(model1)[[7]][,-2],digits=3)
coef | se(coef) | z | Pr(>|z|) | |
---|---|---|---|---|
dist.dev | -0.783 | 0.089 | -8.838 | 0.000 |
forest | -0.427 | 0.125 | -3.413 | 0.001 |
shrub | 0.842 | 0.078 | 10.729 | 0.000 |
We find the same exact estimates. Woohoo!
Note that in the fixed effect implementation (model1.tmb) we are
actually estimating a lot of partial intercepts (strata). However, we
can ignore them as they are not of interest, just a means to an end.
That is why in the above code, we are limiting the estimates from the
summary with the part cond[2:4,]
How do we quantitatively evaluate two locations in terms of habitat selection using our model results? We can do so using Equation 8 of Fieberg et al. 2021.
Perhaps we want to compare a location that is very near development (dist.dev = -2) at high forest and shrub cover (forest = 2, shrub = 2) with that of a location far from development (dist.dev = 2) and also at high forest but low shrub cover (forest = 2, shrub = -2).
# Get estimates
coef = coef(model1)
# Relative Selection
RS = exp(-2*coef[1] + 2*coef[2] + 2*coef[3]) / exp(2*coef[1] + 2*coef[2] + -2*coef[3])
RS
## dist.dev
## 664.787
If given equivalent availability between these two types of sites ( 1) near development at high forest and shrub cover, 2) far from development at high forest and shrub cover ), this individual would relatively select the first location by a factor of 664.79.
Avgar et al. 2017 refers to the relative selection strength (RSS) as the exponentiation of each coefficient.
For example,
# Coefficient for forest
exp(coef[2])
## forest
## 0.6521922
Given two locations that differ by 1 standard deviation of percent
forest (forest
), but are otherwise equal, this individual
would be 0.65 as likely to choose the location with higher
forest
, or equivalently, 1.53 times more likely to choose
the location with the lower forest.
For convenience, we can use the log_rss
function of the
amt
package to predict the log relative strength when
comparing many locations with differing covariates values. To do so, we
need create two data frames of the covariate combinations of interest:
s1
and s2
.
# Make a new data.frame for s1
s1 = data.frame(
dist.dev = seq(from = -2, to = 2, length.out = 100),
forest = 0,
shrub = 0,
strata = 1 # needed, but the number is irrelevant
)
# data.frame for s2
s2 = data.frame(
dist.dev = 0, # mean of dist.dev
forest = 0,
shrub = 0,
strata = 1 # needed, but the number is irrelevant
)
# Calculate log-RSS
lr2 = amt::log_rss(model1.amt1,
s1,
s2,
ci = "se",
ci_level = 0.95
)
plot(lr2,lwd=3)
## Guessing x_vars:
## x_var1 = dist.dev
## x_var2 = forest
lines(lr2$df$dist.dev_x1,lr2$df$lwr,lty=2)
lines(lr2$df$dist.dev_x1,lr2$df$upr,lty=2)
abline(h=0,lwd=2,col=2)
We have predicted a mean (solid line) and 95% confidence intervals
(dotted) of the log-RSS between each value of s1 relative to s2. These
locations differ only in their values of dist.dev
, and the
log-RSS is equal to 0 when the two locations are identical (i.e.,
dist.dev = 0). Above the zero line the individual is selected more than
available and below the zero line they are selecting less than
available.
In fitting these data using a movement-based HSF, we are simultaneously trying to account for two important assumptions. First, we are limiting what is available to the animal at each used location by defining the availability as around this area and informed by the movement parameters of step-length (how far is the animal likely to go) and step-length (what direction is the animal likely to go). Second, we are defining our data setup to deal with the fine-scale sequential dependency of consecutive relocations and minimize issues of autocorrelation. If there is interest in using these same data at a lower selection-order, its important to consider the issues of autocorrelation and availability. In the manuscript, we provide two options for doing so.
Context only.
It is the responsibility of the researcher to make sure their modeling process is done such that they are approximating the true underlying model well. To demonstrate this, we will consider how estimated coefficients change with increasing numbers of available samples per each used location.
# Grab individual one's data
indiv.dat1 = sims$indiv[[1]]
#Size of available samples per used locaiton
n.avail2 = seq(2,100,by=2)
#Save coefficients
coef.save = NULL
#Loop across available sample sizes
for(i in 1:length(n.avail2)){
#Loop through each used location and reduce the number of available samples
ind.dat=NULL
for(j in 1:n.step){
index = which(indiv.dat1$strata==j)
#Get the used locaiton and the sub-sampled availables
ind.dat.temp = indiv.dat1[index,][1:(n.avail2[i]+1),]
ind.dat = rbind(ind.dat,ind.dat.temp)
}
# fit model with data
model.temp = clogit(status ~ dist.dev + forest + shrub + strata(strata),
data = ind.dat
)
coef.save = rbind(coef.save,
coef(model.temp)
)
}
coef.save = data.frame(n.avail2,coef.save)
colnames(coef.save) = c("N.Avail","beta1","beta2","beta3")
par(mfrow=c(1,1))
plot(coef.save$N.Avail, coef.save$beta1,lwd=3,type="l",col=2,ylim=c(-01,1),
main="Slope Coefficients",
xlab="Number of Available Samples",ylab="Coefficient Estimate")
lines(coef.save$N.Avail, coef.save$beta2,lwd=3,col=2)
lines(coef.save$N.Avail, coef.save$beta3,lwd=3,col=2)
#knitr::kable(coef.save,digits=3)
We can see that our estimates of the slope coefficients are sensitive to the number of available locations. The estimates visually stabilize after about 80 available locations per used location.
Another way to look at this sensitivity is by showing the variability of the coefficients within and across different sizes of available samples.
# Grab individual one's data
indiv.dat1 = sims$indiv[[1]]
# Size of available samples per used locaiton
n.avail2 = c(2,10,20,40,60,80,100)
# Number of sample replications
n.sample = 20
# Save coefficients
coef.save = NULL
#Loop across available sample sizes
for(i in 1:length(n.avail2)){
#re-sample each available sample size 20 times
for (z in 1:n.sample){
#Loop through each used location and reduce the number of available samples
ind.dat = NULL
for(j in 1:n.step){
index = which(indiv.dat1$strata==j)
#Get the used locaiton and the sub-sampled availables
rand.sample = sample(index[-1],n.avail2[i])
ind.dat.temp = indiv.dat1[c(index[1],rand.sample),]
ind.dat = rbind(ind.dat,ind.dat.temp)
} # end j step loop
# fit model with data
model.temp = clogit(status ~ dist.dev + forest + shrub + strata(strata),
data = ind.dat
)
coef.save = rbind(coef.save,
coef(model.temp)
)
}#end z sample loop
}#end i loop
one = data.frame(N.Avail=rep(n.avail2,each=n.sample),
Sample=rep(1:n.sample,length(n.avail2)),
beta = coef.save[,1]
)
two = data.frame(N.Avail=rep(n.avail2,each=n.sample),
Sample=rep(1:n.sample,length(n.avail2)),
beta = coef.save[,2]
)
three = data.frame(N.Avail=rep(n.avail2,each=n.sample),
Sample=rep(1:n.sample,length(n.avail2)),
beta = coef.save[,3]
)
plot.data = rbind(one,two,three)
plot.data$Name = c(rep("dist.dev",nrow(one)),
rep("forest",nrow(one)),
rep("shrub",nrow(one))
)
plot.data$N.Avail = as.factor(plot.data$N.Avail)
colnames(plot.data) = c("N.Available.Sample","Sample","Coefficient.Estimate","Name")
ggplot2::ggplot(plot.data, aes(N.Available.Sample, Coefficient.Estimate, fill=factor(Name))) +
theme_bw()+
geom_boxplot()
We see more easily in this plot that there is high variability in the estimated coefficients when the available sample is small. This variability decreases as the available sample grows, showing how the estimates are stabilizing.
We should a priori assume there is individual variation in
habitat selection estimates. This variation is masked when data are
pooled. Let’s consider the implications of pooling all data vs fitting a
model separately to each individual. Note that the object
sims2
has all the individuals combined into a single data
frame. We will use the glmmTMB
function to fit the model
using the Poisson model regression approach with strata as a random
effect, but that is not really estimated. This is fast and stable
implementation.
# Pooled model - no consideration of individual variability
model.pool = glmmTMB(status ~ dist.dev + forest + shrub +
(1|indiv.id.strata),
data=sims2,
family=poisson,
doFit = FALSE
)
model.pool$parameters$theta[1] = log(1e3)
model.pool$mapArg = list(theta=factor(c(NA)))
options(warn=-1)
model.pool = glmmTMB:::fitTMB(model.pool)
options(warn=0)
# Look at estimates when pooling the data
summary(model.pool)
## Family: poisson ( log )
## Formula: status ~ dist.dev + forest + shrub + (1 | indiv.id.strata)
## Data: sims2
##
## AIC BIC logLik deviance df.resid
## 294015.2 294063.1 -147003.6 294007.2 1173566
##
## Random effects:
##
## Conditional model:
## Groups Name Variance Std.Dev.
## indiv.id.strata (Intercept) 1e+06 1000
## Number of obs: 1173570, groups: indiv.id.strata, 12000
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -5.36235 9.12872 -0.59 0.556925
## dist.dev -0.67471 0.01513 -44.60 < 2e-16 ***
## forest -0.05418 0.01594 -3.40 0.000679 ***
## shrub 0.86746 0.01513 57.35 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Separate models to each individual
indiv.fit = lapply(sims[[1]],FUN=function(x){
temp=glmmTMB(status ~ dist.dev+forest+shrub + (1|strata),
data=x,
family=poisson,
doFit = FALSE
)
temp$parameters$theta[1] = log(1e3)
temp$mapArg = list(theta=factor(c(NA)))
options(warn=-1)
temp = glmmTMB:::fitTMB(temp)
options(warn=0)
temp
})
In the pooled data, we see that the standard errors of the coefficients and p-values are very small. We are using a lot of information, assuming no variation among individuals, and thus assuming every individual’s effects can be characterized by these estimates. Now let’s look at just two individual’s estimates when fitting separate models.
summary(indiv.fit[[2]])
## Family: poisson ( log )
## Formula: status ~ dist.dev + forest + shrub + (1 | strata)
## Data: x
##
## AIC BIC logLik deviance df.resid
## 9794.9 9829.4 -4893.5 9786.9 40396
##
## Random effects:
##
## Conditional model:
## Groups Name Variance Std.Dev.
## strata (Intercept) 1e+06 1000
## Number of obs: 40400, groups: strata, 400
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -5.56072 50.00007 -0.111 0.911
## dist.dev -0.69182 0.08691 -7.960 1.72e-15 ***
## forest 0.48584 0.08935 5.438 5.40e-08 ***
## shrub 0.89271 0.09021 9.895 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(indiv.fit[[10]])
## Family: poisson ( log )
## Formula: status ~ dist.dev + forest + shrub + (1 | strata)
## Data: x
##
## AIC BIC logLik deviance df.resid
## 9867.4 9901.8 -4929.7 9859.4 40275
##
## Random effects:
##
## Conditional model:
## Groups Name Variance Std.Dev.
## strata (Intercept) 1e+06 1000
## Number of obs: 40279, groups: strata, 400
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.007679 50.000055 0.000 1
## dist.dev -0.519150 0.077938 -6.661 2.72e-11 ***
## forest 0.464137 0.086106 5.390 7.03e-08 ***
## shrub 0.841780 0.087168 9.657 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
First, notice that the estimated effects are a bit different than the pooled estimates. Importantly, also notice that the standard errors and p-values are larger.
Now let’s look at all the estimates together.
# Extract pooled estimates with Confidence intervals
pool.effect = broom.mixed::tidy(model.pool, effects = "fixed", conf.int = TRUE)
# Extract separate individual estimates
indiv.coef = lapply(indiv.fit,FUN=function(x){
temp=broom.mixed::tidy(x, effects = "fixed",
conf.int = TRUE
)
data.frame(temp[-1,c(4,8,9)])
})
# These are all coefs (1:3) for each individual 1:n.indiv
estimates = sapply(indiv.coef,"[[",1)
LCI = sapply(indiv.coef,"[[",2)
UCI = sapply(indiv.coef,"[[",3)
layout(matrix(c(1,2), nrow = 1, ncol = 2, byrow = TRUE),
width=c(2,1))
plotCI(1:n.indiv,
y=estimates[1,],
ui=UCI[1,],
li=LCI[1,],
ylab="beta",
xlab="Individual",
lwd=2,
ylim=c(-2.5,4)
)
legend("topright",lwd=3,col=c(1,2,3),
legend=c("dist.dev","forest","shrub"),box.lty=0,y.intersp=0.8)
plotCI(1:n.indiv,y=estimates[2,],
ui=UCI[2,],
li=LCI[2,],
ylab="beta_1",
xlab="Individual",
lwd=2,
col=2,
add=TRUE
)
plotCI(1:n.indiv,y=estimates[3,],
ui=UCI[3,],
li=LCI[3,],
ylab="beta_1",
xlab="Individual",
lwd=2,
col=3,
add=TRUE
)
plotCI(c(1,1,1),y=pool.effect$estimate[2:4],
xlim=c(0.95,1.05),
ui=pool.effect$conf.high[2:4],
li=pool.effect$conf.low[2:4],
ylab="beta",
xlab="Population",
lwd=2,
ylim=c(-2.5,4),
col=c(1,2,3),
xaxt='n'
)
The plot on the right are the pooled estimates for \(\beta_1\) (black; dist.dev), \(\beta_2\) (red; forest), and \(\beta_3\) (green; shrub). The plot on the left are each individual’s estimates (colored the same). By ignoring individual variation, we are much too confident in our estimates of uncertainty and are ignoring a lot of clear variation by individual. The pooled estimate’s certainty is do to pseudoreplication - the treating a sub-unit (each location) as are main unit of replication. Our true unit of replication is the individual.
Since our sample sizes for each individual are equal, we see that the pooled estimates generally relate to the average of each estimate across all individuals. When the number of used locations varies by individual this won’t be the case. The individuals with more used locations will disproportionately influence the pooled estimates.
The code for implementing the methods of Street et al. 2021 can be found at figshare.
You can also find an implementation of these methods to calculate the number of used locations to determine statistical clarity of a hypothesized spatial effect in the [Traditional HSF example] (https://bgerber123.github.io/hsfguide/TraditionalHSF.html#311_Individuals)
If we are interested in obtaining inference to the individual- and population-level effects, we can consider bootstrapping the results from individual models or jointly fitting a single model with a random effect across individuals. We will first consider the bootstrapping method. Second, we will demonstrate the use of random intercepts and slopes, as well as how to fix the variance of the random intercept so there is no shrinkage.
Bootstrapping is a resampling technique used often in statistics to obtain a sampling distribution of possible values. We will re-sample the individual-level coefficients to estimate the population-level mean and variation.
The core functions for bootstrapping are adopted from Dr. Bastille-Rousseau’s github repository for the R package IndRSA.
We first need to get all estimated coefficients from each individual.
coef.list = lapply(indiv.fit,FUN=function(x){fixef(x)[[1]]})
coef.df=do.call(rbind.data.frame, coef.list)
colnames(coef.df)=names(fixef(indiv.fit[[1]])[[1]])
# Remove intercept
coef.df = coef.df[,-1]
# Add name (could be modified to keep track if animals have an ID)
coef.df$name = 1:nrow(coef.df)
# Frequency could be greater than one if there were multiple estimates of a single individual, perhaps across years
coef.df$freq = 1
We are ready to bootstrap the coefficient estimates.
#How many bootstraps to do? More will lead to results with less error
nboot = 1000
#Which columns have coefficients in coef.df
col.locs=1:3
boot=list()
for(i in 1:nboot){
boot[[i]]=apply(
coef.df[sample(nrow(coef.df), nrow(coef.df),
replace=TRUE,
prob=coef.df$freq),
col.locs
],
2,
median,
na.rm=TRUE
)
}
We now want to summarize our bootstrapped results. The population mean and 95% lower and upper confidence intervals are outputted.
# Source summary function
boot.pop.esimates=mean_ci_boot(boot)
rownames(boot.pop.esimates)=c("dist.dev","forest","shrub")
knitr::kable(boot.pop.esimates,digits=3)
Mean | LCI | UCI | |
---|---|---|---|
dist.dev | -0.756 | -0.825 | -0.656 |
forest | -0.058 | -0.370 | 0.289 |
shrub | 0.905 | 0.840 | 0.951 |
The most common use of random effects in habitat selection analysis is the use of a random intercept. As described in the manuscript, this does not account for variation in the slope coefficients, which is problematic.
While we are using common language in describing the models and model
fitting process, note that there are very different meanings of
fixing a parameter
(i.e., random intercept variance) versus
a fixed effect
. The former refers to setting a parameter at
a given value and is not estimated. The later refers to a description of
a type of parameter that is estimated.
# Setup HSF with Poisson regression approximation - random intercept model
# indiv.id.strata indicates individuals and strata and there the variance is fixed so there is no shrinkage
re_int = glmmTMB(status ~ dist.dev + forest + shrub +
(1|indiv.id.strata),
family=poisson ,
data = sims2,
doFit=FALSE
)
# Make the intercept variance large and fixed (i.e., do not estimate).
re_int$parameters$theta[1] = log(1e3)
# Tell glmmTMB to not estimate the variance of the intercept
re_int$mapArg = list(theta=factor(c(NA)))
# Now ready to fit the model
options(warn=-1)
re_int = glmmTMB:::fitTMB(re_int)
options(warn=0)
The fixed components - these are the mean effects when pooling all individuals.
fixef(re_int)
##
## Conditional model:
## (Intercept) dist.dev forest shrub
## -5.36235 -0.67471 -0.05418 0.86746
The random components- these are the effect differences for each variation from the mean estimated (referred to as the fixed effect in the ‘conditional model’ statement). For this model we will have many many intercepts. Specifically, we will have a unique value for the number of individuals times the number of strata within individual
head(ranef(re_int)[[1]][[1]])
## (Intercept)
## A1 0.26258121
## A10 -0.07350571
## A100 0.19616214
## A101 -0.40665805
## A102 -0.40542909
## A103 -0.21143664
summary(re_int)
## Family: poisson ( log )
## Formula: status ~ dist.dev + forest + shrub + (1 | indiv.id.strata)
## Data: sims2
##
## AIC BIC logLik deviance df.resid
## 294015.2 294063.1 -147003.6 294007.2 1173566
##
## Random effects:
##
## Conditional model:
## Groups Name Variance Std.Dev.
## indiv.id.strata (Intercept) 1e+06 1000
## Number of obs: 1173570, groups: indiv.id.strata, 12000
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -5.36235 9.12872 -0.59 0.556925
## dist.dev -0.67471 0.01513 -44.60 < 2e-16 ***
## forest -0.05418 0.01594 -3.40 0.000679 ***
## shrub 0.86746 0.01513 57.35 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
A random intercepts only model is not recommended. This is because we are not dealing with the expected variation of responses to hypothesized spatial factors across individuals.
This is the recommended model structure using random effects.
#Fit RSF with intercept and slopes with random effect
re_int_slopes = glmmTMB(status ~ -1 + dist.dev + forest + shrub +
(1|indiv.id.strata) +
(0+dist.dev|ID) + (0+forest|ID) +
(0+shrub|ID),
family = poisson,
data = sims2,
doFit = FALSE
)
# make the intercept variance large and fixed (i.e., do not estimate).
re_int_slopes$parameters$theta[1] = log(1e3)
# Tell glmmTMB to not estimate the variance of intercept, but
# to do so for the other three variables
re_int_slopes$mapArg = list(theta=factor(c(NA,1:3)))
# Now ready to fit the model
options(warn=-1)
re_int_slopes = glmmTMB:::fitTMB(re_int_slopes)
options(warn=0)
Let’s look at the results. Here, we have the population-level
(across) individual-level means. So, generally, we see that at this
level the estimated coefficients are negative, near zero, and positive
for dist.dev
, forest
, and shrub
,
respectively.
fixef(re_int_slopes)
##
## Conditional model:
## dist.dev forest shrub
## -0.7319 -0.1187 0.8931
We can get estimates with 95% confidence intervals as well. We see that these statements are supported statistically in the above statement based on examining whether intervals include 0 or not.
broom.mixed::tidy(re_int_slopes,
effects = "fixed",
conf.int = TRUE)[,c(3,4,8,9)]
## # A tibble: 3 × 4
## term estimate conf.low conf.high
## <chr> <dbl> <dbl> <dbl>
## 1 dist.dev -0.732 -0.805 -0.659
## 2 forest -0.119 -0.366 0.128
## 3 shrub 0.893 0.838 0.948
Here are estimated population-level coefficients (across individual-level effects). Compared to the bootstrapped results above, they are generally similar. We should not expect them to be the same, as the random effect model shares information across individuals and the bootstrapped esimates do not. However, the interpretation of the two approaches leads to the same conclusions about the estimated sign and statistical clarity.
We can also extract the estimated difference by individual from the population level coefficients, as well as estimated with confidence intervals. These estimates are an indication of whether an individual has a different estimate than the overall population mean.
ranef.out = ranef(re_int_slopes)
ranef.out$cond$ID
## dist.dev forest shrub
## A -0.0415772552 -0.297768475 -0.0363714227
## AA -0.0548487347 -1.012917896 -0.2860561038
## AB -0.1847051636 0.628520828 -0.0761825587
## AC -0.0002854609 0.645967249 0.0107665303
## AD 0.1468169475 0.428356649 0.0759396421
## B 0.0337566532 0.595184189 0.0026117470
## C 0.0755439321 -0.095581133 0.0318355921
## D -0.0180605327 -0.103214238 -0.0408454069
## E -0.1476240346 0.537650470 0.1125674342
## F -0.0010877413 -0.633575001 0.0185569562
## G 0.3319304974 0.208288925 0.0774316094
## H 0.1251768421 -0.309803762 -0.0337981491
## I 0.1729549842 -0.015097438 0.0541519080
## J 0.1777674999 0.580991918 -0.0269005398
## K -0.1993747356 0.201916564 -0.0464435193
## L -0.1103115943 -0.007469888 0.0002586819
## M 0.1719430182 0.819219465 0.1303113960
## N -0.0149391294 -1.382398055 0.0317083965
## O -0.0140991212 -1.029845892 -0.0640590829
## P -0.2966082762 0.383949739 -0.0149878334
## Q 0.1544125462 -0.443704116 -0.1696253282
## R 0.3720575286 0.628249843 0.0344876744
## S -0.1721162201 -0.332745405 -0.0783119261
## T -0.0709172871 0.124869125 0.0205765908
## U -0.0412663648 0.466060293 0.1252510043
## V -0.1983220414 -0.546022663 -0.1315617587
## W -0.0468902077 -1.551384854 0.2531837157
## X 0.2439368282 0.154856461 0.0850566863
## Y -0.2508640580 -0.193295341 0.0110478262
## Z -0.0992507867 1.551537993 -0.1060203707
ranef.out.ci = broom.mixed::tidy(re_int_slopes,
effects = "ran_vals",
conf.int = TRUE)
#Non-intercept estimates per individual
ranef.out.ci[-which(ranef.out.ci$term=="(Intercept)"),-c(1,2,3,4)]
## # A tibble: 90 × 5
## term estimate std.error conf.low conf.high
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 dist.dev -0.0416 0.0852 -0.209 0.125
## 2 dist.dev -0.0548 0.0931 -0.237 0.128
## 3 dist.dev -0.185 0.0840 -0.349 -0.0201
## 4 dist.dev -0.000285 0.0857 -0.168 0.168
## 5 dist.dev 0.147 0.0813 -0.0125 0.306
## 6 dist.dev 0.0338 0.0830 -0.129 0.196
## 7 dist.dev 0.0755 0.0829 -0.0869 0.238
## 8 dist.dev -0.0181 0.0824 -0.180 0.143
## 9 dist.dev -0.148 0.0841 -0.313 0.0173
## 10 dist.dev -0.00109 0.0872 -0.172 0.170
## # ℹ 80 more rows
Lastly, we can a full summary of the model.
summary(re_int_slopes)
## Family: poisson ( log )
## Formula:
## status ~ -1 + dist.dev + forest + shrub + (1 | indiv.id.strata) +
## (0 + dist.dev | ID) + (0 + forest | ID) + (0 + shrub | ID)
## Data: sims2
##
## AIC BIC logLik deviance df.resid
## 292792.3 292864.2 -146390.2 292780.3 1173564
##
## Random effects:
##
## Conditional model:
## Groups Name Variance Std.Dev.
## indiv.id.strata (Intercept) 1.000e+06 1000.0000
## ID dist.dev 3.362e-02 0.1834
## ID.1 forest 4.654e-01 0.6822
## ID.2 shrub 1.521e-02 0.1233
## Number of obs: 1173570, groups: indiv.id.strata, 12000; ID, 30
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## dist.dev -0.73190 0.03715 -19.70 <2e-16 ***
## forest -0.11873 0.12598 -0.94 0.346
## shrub 0.89310 0.02787 32.05 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Note the variance and standard deviation estimates, indicating how
variable coefficients are across individuals. We see that
forest
is estimated as the most variable. This corresponds
well to how the data were generated with forest coefficients having the
highest standard deviation of 1.
Another thing to notice is that the population level effect for forest is not statistically clearly different than zero. Thus, at the population level, percent forest is being selected in proportion to what is available to each individual. Let’s dive into this a bit more. Let’s plot the individual estimates along with the population-level estimate.
indiv.coef.popModel = broom.mixed::tidy(re_int_slopes, effects = "ran_vals", conf.int = TRUE)
index=which(indiv.coef.popModel$term=="forest")
indiv.forest.popModel = data.frame(indiv.coef.popModel[index,c(5,6,8,9)])
# Add back mean to get individual estimates
indiv.forest.popModel[,2:4] = indiv.forest.popModel[,2:4]+rep(fixef(re_int_slopes)[[1]][2],each=20)
# Extract population mean and uncertainty for forest effect
pop.coef.popModel = data.frame(broom.mixed::tidy(re_int_slopes, effects = "fixed", conf.int = TRUE))
pop.coef.popModel=pop.coef.popModel[,c(4,8,9)]
#Plot
plotCI(x = 1:n.indiv,
y = indiv.forest.popModel$estimate,
ui = indiv.forest.popModel$conf.high,
li = indiv.forest.popModel$conf.low,
lwd = 3,col=1,
xlab="Individual",ylab="Forest Coefficient")
abline(h=pop.coef.popModel$estimate[2],lwd=3,col=1,lty=4)
abline(h=pop.coef.popModel$conf.low[2],lwd=3,col=2,lty=4)
abline(h=pop.coef.popModel$conf.high[2],lwd=3,col=2,lty=4)
Our plot shows the individual effects of forest
(vertical lines) along with the population-level mean and confidence
intervals (horizontal lines). Compare this output and inference to the
figure (right-side) for the pooled model fit at the end of Section 3.11.
In the pooled case we would conclude that at the population level, this
species avoids forest (i.e., negative coefficient), and we would say
that with some level of statistical certainty because the confidence
intervals do not include zero. Here we get a more nuanced picture of
what’s happening, and that leads to a very different conclusion.
What is clear from this plot is that the reason there is no statistically clear difference of the population-level effect from zero is because there is a wide range of effects across individuals. Some individuals have slightly positive coefficients and some have slightly negative. Since these are generally equal, they balance out to a population-level effect of zero. The story is more complicated! To decide on statistical clarity between an individual’s effect and the population, you can go back to the confidence intervals of the estimated effect differences.
The code for implementing the methods of Street et al. 2021 can be found at figshare.
You can also find an implementation of these methods to calculate the
number of individuls tracked/sampled to determine statistical clarity of
a hypothesized population mean effect in the
TraditionalHSF.Rmd
file.
In this section we discuss ways of considering behavior in habitat selection analyses. To demonstrate the effect of ignoring behavior, we will simulate data where selection is different for two sets of animal locations. Imagine an individual is highly selective of forest cover when it is resting, but when foraging (and otherwise) selects against forest cover in proportion to its availability.
hsf.true.behav = vector("list",2)
hsf.true.behav[[1]] = covs[[2]]*2
hsf.true.behav[[2]] = covs[[2]]*-2
sim.behav1 = sim.ind.movement.hsf(hsf.true = hsf.true.behav,
n.time = n.time,
n.step = n.step,
n.avail = n.avail,
angle.mean = angle.mean,
angle.kappa = angle.kappa,
step.rate = step.rate
)
# Combine the data
data.ignore= rbind(sim.behav1$indiv[[1]],
sim.behav1$indiv[[2]]
)
# Create ID vector for the different behavioral data
ID=rep(1:2,each = nrow(sim.behav1$indiv[[1]]))
data.ignore$ID = ID
# Create ID vector for unique strata within individual
data.ignore$indiv.id.strata=unique(data.ignore$ID+data.ignore$strata*100)
Let’s fit a model where we ignore the behavioral differences in selection and fit a single model with all the data.
model.ignore.behav = clogit(status ~ dist.dev + strata(indiv.id.strata),
data = data.ignore
)
summary(model.ignore.behav)
## Call:
## coxph(formula = Surv(rep(1, 80800L), status) ~ dist.dev + strata(indiv.id.strata),
## data = data.ignore, method = "exact")
##
## n= 74794, number of events= 800
## (6006 observations deleted due to missingness)
##
## coef exp(coef) se(coef) z Pr(>|z|)
## dist.dev 0.008814 1.008853 0.035404 0.249 0.803
##
## exp(coef) exp(-coef) lower .95 upper .95
## dist.dev 1.009 0.9912 0.9412 1.081
##
## Concordance= 0.503 (se = 0.011 )
## Likelihood ratio test= 0.06 on 1 df, p=0.8
## Wald test = 0.06 on 1 df, p=0.8
## Score (logrank) test = 0.06 on 1 df, p=0.8
We see that the estimated forest coefficient is not near the values of the true values of -2 or 2. It’s somewhat in between near zero. Essentially, when animals are selecting difference habitats because of behavior, mixing across behaviors can lead to a muddle inference.
Interpreting coefficients and predicting is outlined above in
subsection
7. What are habitat-selection functions used for?
.
If focused on inference, fitting a single model is not only okay, but desirable.
We hope this vignette provided some utility. If so, let us know with an email (brian.gerber@colostate.edu).
Have a nice day.
This report was generated from the R Statistical Software (v4.4.2; R Core Team 2021) using the Markdown language and RStudio. The R packages used are acknowledged below.
Package | Version | Citation |
---|---|---|
amt | 0.3.0.0 | @amt |
base | 4.4.2 | @base |
broom.mixed | 0.2.9.6 | @broommixed |
circular | 0.5.1 | @circular |
geoR | 1.9.4 | @geoR |
glmmTMB | 1.1.10 | @glmmTMB |
knitr | 1.49 | @knitr2014; @knitr2015; @knitr2024 |
plotrix | 3.8.4 | @plotrix |
raster | 3.6.30 | @raster |
remotes | 2.5.0 | @remotes |
ResourceSelection | 0.3.6 | @ResourceSelection |
Rfast | 2.1.0 | @Rfast |
rmarkdown | 2.29 | @rmarkdown2018; @rmarkdown2020; @rmarkdown2024 |
sp | 2.1.4 | @sp2005; @sp2013 |
survival | 3.7.0 | @survival-book; @survival-package |
tidyverse | 2.0.0 | @tidyverse |