Contents

Bayesian Data Analysis: Basics

The three steps of Bayesian data analysis

  1. Setting up a full probability model—a joint probability distribution for all observable and unobservable quantities in a problem.

  2. Conditioning on observed data: calculating and interpreting the appropriate posterior distribution—the conditional probability distribution of the unobserved quantities of ultimate interest, given the observed data.

  3. Evaluating the fit of the model and the implications of the resulting posterior distribution.

Bayes’ rule

$$ p (\theta, y) = p(\theta)p(y | \theta)$$

  • $p (\theta, y)$: joint probability distribution for $\theta$ and $y$ .
  • $p (\theta)$: prior distribution
  • $p (\theta | y)$: sampling distritbuion (data distitbution).

Bayesian Inference

Simply conditioning on the known value of the data $y$, using the basic property of conditional probability known as Bayes’s rule, yields the posterior density:

$$ p(\theta \mid y)=\frac{p(\theta, y)}{p(y)}=\frac{p(\theta) p(y \mid \theta)}{p(y)} $$

where $p (y) = \sum_{\theta} p(\theta) p(y \mid \theta)$ , and the sum is over all possible values of $\theta$. or $p(y) = \int p(\theta) p(y \mid \theta) d \theta$.

$p(y)$ is fixed, thus can be considered a constant, yielding the unnormalized posterior density :

$$ p(\theta \mid y) \propto p(\theta) p(y \mid \theta) $$

Prediction

To make inferences about an unknown observable, often valled predictive inferences.

The distribution of the unknown but observablle $y$ is

$$ p(y) = \int p(y, \theta) d \theta = \int p(\theta) p (y \mid \theta) d \theta $$

This is often called the marginal distibution of $y$, but a more informative name is the prior predictive distritbuition.

prior because it is not conditional on a previous observvation of the process, and predictive because it is the distribution for a quantity that is observable.

$\tilde{y}$ : posterior predictive distribution. posterior because it is conditional on the observed $y$ and predictive because it is a prediction for an observable $\tilde{y}$.

$$ \begin{aligned} p(\tilde{y} \mid y) &=\int p(\tilde{y}, \theta \mid y) d \theta \\ &=\int p(\tilde{y} \mid \theta, y) p(\theta \mid y) d \theta \\ &=\int p(\tilde{y} \mid \theta) p(\theta \mid y) d \theta \end{aligned} $$

Likelihood and odds ratios

Using Bayes’ rule with a chosen probability model means that the data $y$ affect the posterior inference only through $p(y| \theta$, which, when regarded as a function of θ, for fixed $y$, is called the likelihood function. In this way Bayesian inference is obeying what is sometimes called the likelihood principle.

the ratio of the posterior density $p(\theta \mid y)$ evaluated at the points $\theta_1$ and $\theta_2$ under a given model is called the posterior odds for $\theta_1$ compared to $\theta_2$

$$ \frac{p\left(\theta_{1} \mid y\right)}{p\left(\theta_{2} \mid y\right)}=\frac{p\left(\theta_{1}\right) p\left(y \mid \theta_{1}\right) / p(y)}{p\left(\theta_{2}\right) p\left(y \mid \theta_{2}\right) / p(y)}=\frac{p\left(\theta_{1}\right)}{p\left(\theta_{2}\right)} \frac{p\left(y \mid \theta_{1}\right)}{p\left(y \mid \theta_{2}\right)} $$

the posterior odds are equal to the prior odds multiplied byu the likelihood ratio.

Probability

the mathmatical definition of probabilty: probablities are numerical quantities, defined on a set of “outcomes”, that are non-negative, additive over mutually exclusive outcomes, and sum to 1 over all possible mutally exclusive outcomes.

In Bayesian statistics, probability is used as the fundamental measure or yardstick of uncertainty.

Means and variances of conditional distributions

mean and variance:

$$ \mathrm{E}(u)=\int u p(u) d u, \quad \operatorname{var}(u)=\int(u-\mathrm{E}(u))^{2} p(u) d u $$

variance matrix (covariance matrix)

$$ \operatorname{var}(u)=\int(u-\mathrm{E}(u))(u-\mathrm{E}(u))^{T} p(u) d u $$

The mean of $u$ can be obtained by averaging the conditional mean over the marginal distributionof $v$.

$$ \mathrm{E}(u)= \mathrm{E}(\mathrm{E}( u \mid v)) $$

and variance

$$ \operatorname{var}(u)=\mathrm{E}(\operatorname{var}(u \mid v))+\operatorname{var}(\mathrm{E}(u \mid v)) $$

Summarizing inference by simulation

  1. Simulation forms a central part of much applied Bayesian analysis, because of the relative ease with which samples can often be generated from a probability distribution, even when the density function cannot be explicitly integrated.

  2. Another advantage of simulation is that extremely large or small simulated values often flag a problem with model specification or parameterization that might not be noticed if estimates and probability statements were obtained in analytic form

  3. Generating values from a probability distribution is often straightforward with modern computing techniques based on (pseudo)random number sequences.

Sampling using the inverse cumulative distribution function

A method for sampling from discrete and continuous distributuions using the cumalative distribution function or cdf, $F$, of a one-dimensional distribution, $p(v)$, is defined by

$$ \begin{aligned} F\left(v_{*}\right) &=\operatorname{Pr}\left(v \leq v_{*}\right) \\ &= \begin{cases}\sum_{v \leq v_{*}} p(v) & \text { if } p \text { is discrete } \\ \int_{-\infty}^{v_{*}} p(v) d v & \text { if } p \text { is continuous. }\end{cases} \end{aligned} $$

The inverse cdf can be used to obtain random samples from the distribution p, as follows:

  1. Draw a random value, $U$, from the uniform distribution on [0, 1], using a table of random numbers. Let $v = F^{-1} (U)$.
  2. The value $v$ will be a random draw from $p$,

Simulation of posterior and posterior predictive quantities

In practice, we are most often interested in simulating draws from the posterior distribution of the model parameters $\theta$, and perhaps from the posterior predictive distribution of unknown observables $\tilde{y}$.

Results from a set of $S$ simulation draws can be stored in the computer in an array, e.g.

$$ \begin{array}{ccccccc} \begin{array}{c} \text { Simulation } \\ \text { draw } \end{array} & \multicolumn{2}{c}{\text { Parameters }} & \multicolumn{3}{c}{\text { Predictive }} \\ & \theta_{1} & \ldots & \theta_{k} & \tilde{y}_{1} & \ldots & \tilde{y}_{n} \\ \hline 1 & \theta_{1}^{1} & \ldots & \theta_{k}^{1} & \tilde{y}_{1}^{1} & \ldots & \tilde{y}_{n}^{1} \\ \vdots & \vdots & \ddots & \vdots & \vdots & \ddots & \vdots \\ \mathrm{S} & \theta_{1}^{S} & \ldots & \theta_{k}^{S} & \tilde{y}_{1}^{S} & \ldots & \tilde{y}_{n}^{S} \end{array} $$

From these simulated values, we can estimate the posterior distribution of any quantity of interest, such as $\theta_1 / \theta_2$. We can estimate the posterior probability of any event, such as $Pr (\tilde{y}_1 + \tilde{y}_2) > e^{\theta_1}$