The Setup

We will estimate the annual survival probability of European Dipper using capture-recapture data via a hierarchical Bayesian version of the cormack-jolly-seber model.

Load the necessary packages.

  library(stringr)
  library(rjags)
  library(coda)
  library(runjags)

Next, read in the data. The first column is the capture-history. The second column indicates males (0) and females (1). The third column indicate females (1) and males (0). The fourth column are just semi-colons…it’s MARK thing.

#Load the Data
  dipper = read.table("DIPPER.INP", skip = 2,sep="", colClasses = "character")
  head(dipper)

##        V1 V2 V3 V4
## 1 1111110  1  0  ;
## 2 1111100  0  1  ;
## 3 1111000  1  0  ;
## 4 1111000  0  1  ;
## 5 1101110  0  1  ;
## 6 1100000  1  0  ;

#The number of individuals marked
  nrow(dipper)

## [1] 294

Next, lets manipulate the inputted data to create capture-histories in columns, which we will use to fit our model.

# split column 1 into columns
  CH = matrix(as.integer(str_split_fixed(dipper[,1],"",7)),nrow=nrow(dipper))
  head(CH)

##      [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,]    1    1    1    1    1    1    0
## [2,]    1    1    1    1    1    0    0
## [3,]    1    1    1    1    0    0    0
## [4,]    1    1    1    1    0    0    0
## [5,]    1    1    0    1    1    1    0
## [6,]    1    1    0    0    0    0    0

#  Sex variable: female = 1; male = 0
  sex = as.integer(dipper$V3)

We can see that there are 7 columns in the capture-history, 1 for each year. So, this study has 7 years of data total.

Next, we need to setup two specialized functions. ‘get.first’ will identify the first occasion each individual was initially captured. The next function will use the capture history to create logical designations of when we know individuals are alive to be used to initialize the state parameter z.

# Create vector with occasion of marking for each individual (row of CH)
  get.first <- function(x) min(which(x!=0))
  f <- apply(CH, 1, get.first)

  z.init <- matrix(NA, nrow = nrow(CH), ncol = ncol(CH))
  for(i in 1:dim(z.init)[1]){
    z.init[i, f[i]:dim(z.init)[2]] <- 1
    z.init[i,f[i]] <- NA
  }

Lab Assignment

Step 1

Fit a CJS survival model that includes a sex effect on survival probability. Adapt the jags model code in the other file and implementation code that is above.

# MCMC settings
  ni <- 15000 # number of iterations
  nt <- 2     # number of iterations to thin by
  nb <- 5000  # number of iterations to burn (to toss out initially)
  na <- 2000  # number of iterations to use to adapt to sample efficiently
  nc <- 3     # number of chains


# Parameters monitored
  parameters <- c("beta0","beta1", "p","male.phi","female.phi")
  
  inits <- function(){list(beta0 = rnorm(1),
                           beta1 = rnorm(1),
                           p = runif(1, 0, 1), 
                           z = z.init)}
  
# Bundle data
  jags.data <- list(y = CH, f = f, nind = dim(CH)[1], n.occasions = dim(CH)[2],
                    sex=sex)
  
# Setup the Model
  jm <- jags.model(file="cjs2.r", data=jags.data,n.chains=nc,n.adapt=na,inits=inits)

## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 848
##    Unobserved stochastic nodes: 851
##    Total graph size: 4950
## 
## Initializing model

# Update the model with the burnin
  update(jm, n.iter=nb)
  
#Fit the model
  post2 <- coda.samples(jm, variable.names=parameters, n.iter=ni, thin=nt)
  
#Look at the results
  summary(post2)

## 
## Iterations = 7002:22000
## Thinning interval = 2 
## Number of chains = 3 
## Sample size per chain = 7500 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##                Mean      SD  Naive SE Time-series SE
## beta0       0.28591 0.14296 0.0009531      0.0016464
## beta1      -0.07254 0.19516 0.0013010      0.0021768
## female.phi  0.55289 0.03423 0.0002282      0.0002602
## male.phi    0.57064 0.03484 0.0002323      0.0004010
## p           0.89542 0.02891 0.0001928      0.0003324
## 
## 2. Quantiles for each variable:
## 
##                 2.5%     25%      50%     75%  97.5%
## beta0       0.009695  0.1881  0.28395 0.38121 0.5703
## beta1      -0.455818 -0.2034 -0.07401 0.05912 0.3061
## female.phi  0.486004  0.5294  0.55280 0.57619 0.6195
## male.phi    0.502424  0.5469  0.57051 0.59417 0.6388
## p           0.833283  0.8771  0.89748 0.91611 0.9458

The summary output shows us the quantiles of each model paramter that was defined in the variable parameters.

Step 2

Check that parameters have converged. Show evidence of this by plotting and calculating the gelman-rubin convergence diagnostic, i.e, function gelman.diag.

  plot(post2)

  gelman.diag(post2)

## Potential scale reduction factors:
## 
##            Point est. Upper C.I.
## beta0               1          1
## beta1               1          1
## female.phi          1          1
## male.phi            1          1
## p                   1          1
## 
## Multivariate psrf
## 
## 1

We see the diagnostic statistic (gelman-rubin statistics; known as \(\hat{R}\)) are at 1, indicating no sign of lack of convergence. Also, the posterior distribution traceplots look like fuzzy caterpillars and are overlapping, also demonstrating no signs of convergence issues.

Step 3a

Since the parameters have converged, we can combine the chains.

  post.combined = runjags::combine.mcmc(post2)
  head(post.combined)

## Markov Chain Monte Carlo (MCMC) output:
## Start = 1 
## End = 13 
## Thinning interval = 2 
##         beta0       beta1 female.phi  male.phi         p
## 1  0.57102956 -0.44287437  0.5319950 0.6390007 0.8578579
## 3  0.44030038 -0.12304907  0.5786542 0.6083306 0.8535086
## 5  0.19107304 -0.09355509  0.5243602 0.5476235 0.9525633
## 7  0.16692649  0.06846217  0.5585769 0.5416350 0.9351104
## 9  0.03426400  0.23970981  0.5680682 0.5085652 0.8783200
## 11 0.07074674  0.22307828  0.5729323 0.5176793 0.8846872
## 13 0.15026892 -0.08623061  0.5160041 0.5374967 0.8774565

# This is equivalent to using the do.call function to rbind each matrix of posterior samples across list elements
  post.combined <- do.call("rbind", post2)

Since we setup derived parameters in our JAGS model, we simply need to grab these posterior distributions for plotting.

# Female and male survival probability  
  # Plot posteriors  
  plot(density(post.combined[,3]),lwd=3,col=1,"Annual Survival")
  lines(density(post.combined[,4]),lwd=3,col=2)  
  legend("topright",lwd=3,col=c(1,2),legend=c("Female","Male"))

Step 3b

Alternativel, if we didn’t specify the derived parameters in JAGS, we can use the estimated posterior parameters of survival (\(\beta_{0}\) and \(\beta_{1}\)) to derive the posterior distributions for the probability of survival for males and females. These are exactly equivalent to the posteriors from above.

  beta0 = post.combined[,1]
  beta1 = post.combined[,2]
  
  # Posteriors of survival of males and females  
  male.survival = plogis(beta0)
  female.survival = plogis(beta0+beta1)
  
  # Plot posteriors  
  plot(density(female.survival),lwd=3,col=1,"Annual Survival")
  lines(density(male.survival),lwd=3,col=2)  
  legend("topright",lwd=3,col=c(1,2),legend=c("Female","Male"))

Here we see the posterior distributions of male and female annual survival. They are similar, indicating only a possible small differences in survival between the sexes.

Step 4

Use Monte Carlo integration on the sex effect parameter to estimate the probability that the male survival is less than female survival.

# The probability of a sex effect- that female survival is less than male survival. We can determine this by how many samples are less than zero.

# But, to evaluate weather male survival is less than female survival, we want to know if beta1 is positive, so...
length(which(beta1>0))/length(beta1)

## [1] 0.3549333

The probability that \(\beta_1 > 0\) indicates there is no evidence of a difference. A high probability, values near 1, would indicate a statistical clear negative effect. However, if we saw probabilities near 0, this would indicate a clear statistical difference between female and male survival probability.