Bayes the way

All things Bayesian

  • Bayesian Inference

  • Bayes Thereom

  • Bayesian Components

    • likelihood, prior, evidence, posterior
  • Conjugacy

  • Hippo Case Study

  • Bayesian Computation






Bayesian Inference

Probability, Data, and Parameters

What do we want our model to tell us?

Do we want to make probability statements about our data?


Likelihood = P(data|parameters)


90% CI: the long-run proportion of corresponding CIs that will contain the true value 90% of the time.

Probability, Data, and Parameters

What do we want our model to tell us?

Do we want to make probability statements about our parameters?


Posterior = P(parameters|data)


Alternative Interval: 90% probability that the true value lies within the interval, given the evidence from the observed data.

Likelihood Inference

Estimate of the population size of hedgehogs at two sites.

Bayesian Inference

Posterior Samples

 [1] 102.67671  81.11546  81.75260  87.77246  73.99043  80.70631  76.26219
 [8]  83.99927  64.74208  26.93133

Bayesian Inference

Posterior Samples

 [1] 102.67671  81.11546  81.75260  87.77246  73.99043  80.70631  76.26219
 [8]  83.99927  64.74208  26.93133

Bayesian Inference

diff=post2-post1
length(which(diff>0))/length(diff)
[1] 0.8362

Likelihood Inference (coeficient)

y is Body size of a beetle species

x is elevation

summary(glm(y~x))

Call:
glm(formula = y ~ x)

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   0.9862     0.2435   4.049 0.000684 ***
x             0.5089     0.4022   1.265 0.221093    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for gaussian family taken to be 1.245548)

    Null deviance: 25.659  on 20  degrees of freedom
Residual deviance: 23.665  on 19  degrees of freedom
AIC: 68.105

Number of Fisher Scoring iterations: 2

Bayesian Inference (coeficient)

Bayesian Inference (coeficient)

#Posterior Mean
  mean(post)
[1] 0.5185186


#Credible/Probability Intervals 
  quantile(post,prob=c(0.025,0.975))
     2.5%     97.5% 
-1.461887  2.403232 


# #Probabilty of a postive effect
 length(which(post>0))/length(post)
[1] 0.709


# #Probabilty of a negative effect
 length(which(post<0))/length(post)
[1] 0.291






Bayesian Theorem

Bayes Theorem

Link

  • Marginal Probability

    • \(P(A)\)
    • \(P(B)\)
Click for Answer
#P(A)
3/10 = 0.3

#P(B)
5/10 = 0.5
Sampled N = 10 locations

Bayes Theorem

  • Joint Probability

    • \(P(A \cap B)\)
Click for Answer
#P(A and B)
2/10 = 0.2
Sampled N = 10 locations

Bayes Theorem

  • Conditional Probability

    • \(P(A|B)\)
    • \(P(B|A)\)
Click for Answer
#P(A|B)
2/5 = 0.4

#P(B|A)
2/3 = 0.6666
Sampled N = 10 locations

Bayes Theorem

  • OR Probability

    • \(P(A\cup B)\)
Click for Answer
# P(A or B)
# P(A) + P(B) - P(A and B)

0.3 + 0.5 - 0.2 = 0.6
Sampled N = 10 locations

Notice that…

\[ \begin{equation} P(A \cap B) = 0.2 \\ P(A|B)P(B) = 0.4 \times 0.5 = 0.2 \\ P(B|A)P(A) = 0.6666 \times 0.3 = 0.2 \\ \end{equation} \]

\[ \begin{equation} P(A|B)P(B) = P(A \cap B) \\ P(B|A)P(A) = P(A \cap B) \\ \end{equation} \]

\[ \begin{equation} P(B|A)P(A) = P(A|B)P(B) \end{equation} \]

Bayes Theoreom

\[ \begin{equation} P(B|A) = \frac{P(A|B)P(B)}{P(A)} \\ \end{equation} \]

\[ \begin{equation} P(B|A) = \frac{P(A \cap B)}{P(A)} \\ \end{equation} \]






Bayesian Components

Bayes Components

param = parameters \[ \begin{equation} P(\text{param}|\text{data}) = \frac{P(\text{data}|\text{param})P(\text{param})}{P(\text{data})} \\ \end{equation} \]

Posterior Probability/Belief

Likelihood

Prior Probability

Evidence or Marginal Likelihood

Bayes Components

param = parameters

\[ \begin{equation} P(\text{param}|\text{data}) = \frac{P(\text{data}|\text{param})P(\text{param})}{\int_{\forall \text{ Param}} P(\text{data}|\text{param})P(\text{param})} \end{equation} \]

Posterior Probability/Belief

Likelihood

Prior Probability

Evidence or Marginal Likelihood

Bayes Components

\[ \begin{equation} \text{Posterior} = \frac{\text{Likelihood} \times \text{Prior}}{\text{Evidence}} \\ \end{equation} \]

\[ \begin{equation} \text{Posterior} \propto \text{Likelihood} \times \text{Prior} \end{equation} \]

\[ \begin{equation} \text{Posterior} \propto \text{Likelihood} \end{equation} \]

The Prior

All parameters in a Bayesian model require a prior specified; parameters are random variables.

\(y_{i} \sim \text{Binom}(N, \theta)\)

\(\theta \sim \text{Beta}(\alpha = 4, \beta=2)\)

curve(dbeta(x, 4,2),xlim=c(0,1),lwd=5)

The Prior

\(\theta \sim \text{Beta}(\alpha = 1, \beta=1)\)

curve(dbeta(x, 1,1),xlim=c(0,1),lwd=5)

The Prior

  • The prior describes what we know about the parameter before we collect any data

  • Priors can contain a lot of information (informative priors ) or very little (diffuse priors )

  • Well-constructed priors can also improve the behavior of our models (computational advantage)

  • No such thing as a ‘non-informative’ prior

The Prior

Use diffuse priors as a starting point

It’s fine to use diffuse priors as you develop your model but you should always prefer to use “appropriate, well-contructed informative priors” (Hobbs & Hooten, 2015)

The Prior

Use your “domain knowledge”

We can often come up with weakly informative priors just by knowing something about the range of plausible values of our parameters.

The Prior

Dive into the literature

Find published estimates and use moment matching and other methods to convert published estimates into prior distributions

The Prior

Visualize your prior distribution

Be sure to look at the prior in terms of the parameters you want to make inferences about (use simulation!)

The Prior

Do a sensitivity analysis

Does changing the prior change your posterior inference? If not, don’t sweat it. If it does, you’ll need to return to point 2 and justify your prior choice






Conjugacy

Bayesian Model

Model

\[ \textbf{y} \sim \text{Bernoulli}(p) \]

Prior

\[ p \sim \text{Beta}( \alpha, \beta) \\ \]

These are called Prior hyperparameters

\[ \alpha = 1 \\ \beta = 1 \]

curve(dbeta(x,1,1),xlim=c(0,1),lwd=3,col=2,xlab="p",
      ylab = "Prior Probability Density")

Conjugate Distribution

Likelihood (Joint Probability of y)

\[ \mathscr{L}(p|y) = \prod_{i=1}^{n} P(y_{i}|p) = \prod_{i=1}^{N}(p^{y}(1-p)^{1-y_{i}}) \]

Prior Distribution

\[ P(p) = \frac{p^{\alpha-1}(1-p)^{\beta-1}}{B(\alpha,\beta)} \]

Posterior Distribution of p

\[ P(p|y) = \frac{\prod_{i=1}^{N}(p^{y}(1-p)^{1-y_{i}}) \times \frac{p^{\alpha-1}(1-p)^{\beta-1}}{B(\alpha,\beta)} }{\int_{p}(\text{numerator})} \]

CONJUGACY!

\[ P(p|y) \sim \text{Beta}(\alpha^*,\beta^*) \]

\(\alpha^*\) and \(\beta^*\) are called Posterior hyperparameters

\[ \alpha^* = \alpha + \sum_{i=1}^{N}y_i \\ \beta^* = \beta + N - \sum_{i=1}^{N}y_i \\ \]

Wikipedia Conjugate Page

Conjugate Derivation






Case Study

Hippos

We do a small study on hippo survival and get these data…

7 Hippos Died
2 Hippos Lived

Hippos: Likelihood Model

\[ \begin{align*} \textbf{y} \sim& \text{Binomial}(N,p)\\ \end{align*} \]

# Survival outcomes of three adult hippos
  y1=c(0,0,0,0,0,0,0,1,1)
  N1=length(y1)
  mle.p=mean(y1)
  mle.p
[1] 0.2222222

Hippos: Bayesian Model (Prior 1)

\[ \begin{align*} \textbf{y} \sim& \text{Binomial}(N,p)\\ p \sim& \text{Beta}(\alpha,\beta) \end{align*} \]

  alpha.prior1=1
  beta.prior1=1

Hippos: Bayesian Model (Prior 2)

  alpha.prior2=10
  beta.prior2=2

Hippos: Bayesian Model (Prior 3)

  alpha.prior3=150
  beta.prior3=15

Hippos: Bayesian Model (Posteriors)

\[ P(p|y) \sim \text{Beta}(\alpha^*,\beta^*)\\ \alpha^* = \alpha + \sum_{i=1}^{N}y_i \\ \beta^* = \beta + N - \sum_{i=1}^{N}y_i \\ \]

# Note- the data are the same, but the prior is changing.
# Gibbs sampler
  post.1=rbeta(10000,alpha.prior1+sum(y1),beta.prior1+N1-sum(y1))
  post.2=rbeta(10000,alpha.prior2+sum(y1),beta.prior2+N1-sum(y1))
  post.3=rbeta(10000,alpha.prior3+sum(y1),beta.prior3+N1-sum(y1))

Hippos: Bayesian Model (Posteriors)

Hippos: Bayesian Model (Posteriors)

Hippos: Bayesian Model (Posteriors)

Hippos: More data! (Prior 1)

y2=c(0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,0,
     1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1)
length(y2)
[1] 40

Hippos: More data! (Prior 2)

Hippos: More data! (Prior 3)

Hippos: Data/prior Comparison






Bayesian Computation

Markov Chain Monte Carlo

Often don’t have conjugate likelihood and priors, so we use MCMC algorithims to sample posteriors.

Class of algorithim

  • Metropolis-Hastings
  • Gibbs
  • Reversible-Juno
  • No U-turn Sampling

\[ P(p|y) = \frac{\prod_{i=1}^{N}(p^{y}(1-p)^{1-y_{i}}) \times \frac{p^{\alpha-1}(1-p)^{\beta-1}}{B(\alpha,\beta)} }{\int_{p}(\text{numerator})} \]

\[ P(p|y) \propto\prod_{i=1}^{N}(p^{y}(1-p)^{1-y_{i}}) \times \frac{p^{\alpha-1}(1-p)^{\beta-1}}{B(\alpha,\beta)} \]

MCMC Sampling

  • number of sample (iterations)
  • thinning (which iterations to keep), every one (1), ever other one (2), every third one (3)
  • burn-in (how many of the first samples to remove)
  • chains (unique sets of samples; needed for convergence tests)

Sofware Options

  • Write your own algorithim
  • WinBUGS/OpenBUGS, JAGS, Stan, Nimble