Lecture 3: Discrete & continuous distributions
Institute for Theoretical Physics, Heidelberg University
e.g., fair die.
For a binary RV \(X \in \{ 0, 1\}\) (e.g., toss a coin or open/close configuration of an ion channel), the Bernoulli distribution assigns the probability \[ P(x) = \pi^x (1-\pi)^{1-x} \]
such that we observe
\(\pi\) is the parameter of the distribution.
We have \[ E[X] = \pi \cdot 1 + (1-\pi) \cdot 0 = \pi = P(X=1) \] \[ \begin{align*} \text{Var}[X] &= E\left[(X-\mu)^2\right] \\ & = E[X^2] - E[X]^2 \\ & = \pi\cdot 1^2 + (1-\pi) \cdot 0^2 - \pi^2 \\ &= \pi(1-\pi) \end{align*} \]
Probability of a single \(N\)-tuple with \(k\) 1’s and \((N-k)\) 0’s: \[ \pi^k (1-\pi)^{N-k} \]
Typically we are interested in the number \(X\) of hits within the \(N\) observations: indistinguishability!1
Sum up all the combinations of hits within the observations. Number of \(N\)-tuples with exactly \(X=k\) hits: \(N \choose k\)
Notation
For a RV \(X\) drawn from a binomial distribution with given parameters \(\pi\) and \(N\) we also write \[ X \sim B(\pi, N) \] where the tilde stands for “distributed according to.”
\[ E[X] = N\pi \]
\[ \text{Var}[X] = N\pi(1-\pi) \]
Proof
Linearity of expectations and iid property of the \(N\) observations. See also T3.7 in Wackerly, Mendenhall, and Scheaffer (2014).
\[ F(K) = \sum_{k=0}^K {N \choose k} \pi^k (1-\pi)^{N-k} \]
Similar to binomial: defined on strings of binary outcomes. But only sequences that have all 0’s except for the last entry, which is a 1, i.e., \(A_1 = (1), A_2=(0\ 1), A_3 = (0\ 0\ 1), \dots\)
Intuition
Experiment is repeated until the first hit. Can represent waiting times for certain events to arrive in fixed-time intervals.
Follows sequence of \(k\) iid Bernoulli experiments1 \[ P(A_k) = P(X=k) = (1-\pi)^{k-1}\pi \]
\[ F(k) = 1-(1-\pi)^k \]
Proof
\[ F(k) := P(X\leq k) = \sum_{i=1}^k (1-\pi)^{i-1}\pi = \frac{1-(1-\pi)^k}{1-(1-\pi)} \pi = 1-(1-\pi)^k \]
\[ E[X] = \frac 1\pi \]
\[ \text{Var}[X] = \frac{1-\pi}{\pi^2} \]
Select \(n\) elements from a population of \(N\): prob of sample point: \(1/{N \choose n}\). Probability of drawing \(y\) red balls?
\[ P(y) = \frac{{r \choose y}{N-r \choose n-y}}{N \choose n} \]
\[ E[Y] = \frac{nr}{N} \]
\[ \text{Var}[Y] = n\frac{r}{N}\frac{N-r}{N}\frac{N-n}{N-1} \]
\[ P(X=k) = \lim_{N\to\infty} {N \choose k} \pi^k (1-\pi)^{N-k} \]
Define \(\lambda := N\pi\)
\[ \begin{align} & \lim_{N\to\infty} {N \choose k} \pi^k (1-\pi)^{N-k} \\ &= \lim_{N\to\infty} \frac{N(N-1)\dots(N-k+1)}{k!}\left(\frac{\lambda}{N} \right)^k\left(1-\frac{\lambda}{N} \right)^{N-k} \\ &= \frac{\lambda^k}{k!} \lim_{N\to\infty} \left(1-\frac{\lambda}{N} \right)^N \left(1-\frac{\lambda}{N} \right)^{-k} \left( 1 - \frac{1}{N} \right) \dots \left( 1 - \frac{k-1}{N} \right) \\ &= \frac{\lambda^k}{k!} \text{e}^{-\lambda} \end{align} \]
\[ E[X] = \text{Var}[X] = \lambda \]
For a continuous probability distribution \(P(X)\), RV \(X\) is continuous, i.e., infinitely many outcomes.
Example application: amount of rainfall in some area.
CDF is necessary to define a continuous distribution function \[ F(x) := p(X\leq x) = \int_{-\infty}^x \text{d}u\,f(u) \]
Assign probability to an interval \([x_0, x_1]\) \[ p(x_0 < X \leq x_1) = F(x_1) - F(x_0) = \int_{x_0}^{x_1} \text{d}u\, f(u) \]
Definition
\[f(x) = \frac{\text{d}F(x)}{\text{d}x}\]
Quantile of the distribution
\[ F(x|y) := P(X \leq x | Y = y) = \int_{-\infty}^x \text{d}u\, f_x(u|y) = \int_{-\infty}^x \text{d}u\, \frac{f_x(u, y)}{f_y(y)} \]
\[ F(x) = \int_{-\infty}^\infty \text{d}u\, F(x|u)f_y(u) \]
Independent random variables
Independent RVs \(X\) and \(Y\) we get \[ F(x,y) = F_x(x) F_y(y) \]
\[ f(x,y) = f_x(x)f_y(y) \]
Defined analogously to the discrete case, replacing sums by integrals.
Also hold for the continuous case.
\[ f(x) := \left\{ \begin{align} \frac{1}{\theta_2-\theta_1}, & \quad x\in [\theta_1, \theta_2], \theta_1 < \theta_2\\ 0, & \quad \text{else}\\ \end{align} \right. \]
\[ E[X] = \frac{\theta_1 + \theta_2}{2} \]
\[ \text{Var}[X] = \frac{(\theta_2-\theta_1)^2}{12} \]
Uniform distr. is an important refrerence for generating random numbers and statistical test scenarios.
Consider parameters \(-\infty < \mu < \infty\) and \(\sigma > 0\), the normal distribution reads \[ f(x) = \frac{1}{\sqrt{2\pi}\sigma} \text{e}^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2} \]
\[ E[X] = \mu \]
\[ \text{Var}[X] = \sigma^2 \]
(Proof: Set standardized variable \(z = \frac{x-\mu}\sigma\) and compute \(E[x] = \int_{-\infty}^\infty \text{d}z \, f(z)\)…)
Notation
Normal distribution often denoted \(\mathcal{N}(\mu, \sigma^2)\)
Normal distribution is always symmetric around its mean, which is the same as its mode (unique maximum). As a result, the mean also equals the median
Bounded distributions in the interval \(x \in [0,1]\) (i.e., Bernoulli-type experiments) \[ f(x) = \frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha)\Gamma(\beta)}x^{\alpha-1}(1-x)^{\beta-1} \] where we define the Gamma function \[ \Gamma(\alpha) := \int_0^\infty \text{d}x\, x^{\alpha-1}\text{e}^{-x} \]
Bounded distributions in the interval \(x \in [0,1]\) (i.e., Bernoulli-type experiments) \[ f(x) = \frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha)\Gamma(\beta)}x^{\alpha-1}(1-x)^{\beta-1} \] Notice the similarity with the binomial distribution. Beta is the conjugate prior for the binomial in a Bayesian approach: when used as a prior distribution and multiplied with a binomial, it yields a Beta distribution as posterior!
\[ E[X] = \frac{\alpha}{\alpha+\beta} \]
\[ \text{Var}[X] = \frac{\alpha\beta}{(\alpha+\beta)^2(\alpha+\beta+1)} \]
Important distribution. Produces skewed distributions. \[ f(x) = \left\{ \begin{align} \frac{x^{\alpha-1}\text{e}^{-\frac x\beta}}{\beta^\alpha \Gamma(\alpha)}, & \quad 0 \leq x < \infty\\ 0, &\quad \text{else} \\ \end{align} \right. \] where \(\alpha\) affects the shape of the Gamma distribution (called shape parameter) and \(\beta\) affects the scale (called scale parameter).
\[ E[X] = \alpha\beta \]
\[ \text{Var}[X] = \alpha\beta^2 \]
Proof
\[ \begin{align} E[X] &= \int_0^\infty \text{d}x\, x\frac{x^{\alpha-1}\text{e}^{-\frac x\beta}}{\beta^\alpha \Gamma(\alpha)} = \frac 1{\beta^\alpha \Gamma(\alpha)} \int_0^\infty \text{d}x\, x^\alpha \text{e}^{-\frac x\beta} \\ &= \frac 1{\beta^\alpha \Gamma(\alpha)} \beta^{\alpha+1} \Gamma(\alpha+1) = \frac{\beta \alpha\Gamma(\alpha)}{\Gamma(\alpha)} = \alpha\beta \end{align} \] where we use the identity \(\Gamma(\alpha+1) = \alpha\Gamma(\alpha)\).
Gamma can reduce to other distributions: choices of \(\alpha\) and \(\beta\) can lead to a sum of Poisson distribution, a \(\chi^2\) distribution,1 or an exponential distribution.
Start from Gamma distribution, set \(\alpha=1\)1: \[ f(x) = \left\{ \begin{align} \frac 1\beta \text{e}^{-\frac x\beta}, & \quad 0 \leq x < \infty\\ 0, &\quad \text{else} \\ \end{align} \right. \] where from above we infer \(E[X]=\sqrt{\text{Var}[X]} = \beta\).
\[ P(X > t_0 + t_1 | X > t_0) = P(X > t_1) \]
Definition
Broader class of exponential family distributions, which share certain properties. We can write the distribution as \[ p(x|\eta) = \frac{h(x)}{Z(\eta)} \exp[\eta^\intercal T(x)] = h(x) \exp[\eta^\intercal T(x) - A(\eta)] \]
Rewrite the Bernoulli distribution
\[ \begin{align} P(x|\mu) &= \mu^x (1-\mu)^{1-x} \\ &= \exp[x \log(\mu) + (1-x) \log(1-\mu)] \\ &= \exp\left[x \log\left(\frac\mu{1-\mu}\right) + \log(1-\mu)\right] \\ &= \exp[T(x)\eta - A(\eta)] \end{align} \] such that \(T(x)=x\), \(\eta = \log(\frac \mu{1-\mu})\), \(h(x)=1\).
Important property
Exponential family: derivatives of the log partition function can be used to generate all the cumulants1 of the sufficient statistics. For the first and second cumulants we obtain \[ \begin{align*} \nabla A(\eta) =& E[T(x)] \\ \nabla^2 A(\eta) =& \text{Cov}[T(x)] \end{align*} \] From the second equation we conclude that the Hessian is positive definite, and hence \(A(\eta)\) is convex in \(\eta\). Also, log likelihood is concave.
Consider subset of (prior, likelihood) pairs for which we can compute the posterior in closed form.
If the family \(\mathcal{F}\) corresponds to the exponential family, then the computations can be performed in closed form.
Examples: Beta-binomial and Gaussian-Gaussian