Lecture 4: Error propagation, Chebyshev inequality, Moment-generating functions, central limit theorem, and Multivariate distributions
Institute for Theoretical Physics, Heidelberg University
Today’s lecture
Given an RV \(X\) and its PDF \(f(x)\), we might be interested in the (monotonic) function \(y(x)\). Conservation of probability leads to \[ f(x)\text{d}x = g(y) \text{d}y \] and therefore the new PDF \(g(y)\) yields \[ g(y) = f(x)\left\vert \frac{\text{d}x}{\text{d}y} \right\vert \]
Analogously, the transformation from \(x_1, x_2, \dots\) to \(y_1, y_2, \dots\) will rely on the Jacobian of the transformation \[ J_{ij} = \frac{\partial x_i}{\partial y_j} \] such that \[ g(y_i) = f(x_i)\left\vert J \right\vert \] where \(\Vert\) denotes the determinant.
Use the transformation of variables to find the error (standard deviation) associated to a function of an RV \(X\) in the limit of small deviations from the mean.
Suppose \(X\) is distributed as \(f(x)\) with mean \(\mu\) and variance \(\sigma^2\). We are interested in the PDF of the function \(y(x)\). Expand \(y\) around \(\mu\) \[ y(x) \approx y(\mu) + \frac{\text{d}y}{\text{d}x}\Big\vert_\mu (x-\mu) \]
\[ \begin{align*} E[Y] =& \int \text{d}x\, y(\mu)f(x) + \int \text{d}x \, \frac{\text{d}y}{\text{d}x}\Big\vert_\mu (x-\mu) f(x) \\ =& \int \text{d}x\, y(\mu)f(x) + \frac{\text{d}y}{\text{d}x}\Big\vert_\mu \underbrace{\int \text{d}x \,(x-\mu) f(x)}_{0} \\ =& y(\mu)\underbrace{\int \text{d}x\, f(x)}_1 \\ =& y(\mu) \end{align*} \]
\[ \begin{align*} E[Y^2] =& \int \text{d}x\, \left[ y^2(\mu) + (y'(\mu))^2(x-\mu)^2 + 2y(\mu)y'(\mu)(x-\mu) \right] f(x) \\ =& y^2(\mu) + y'^2 \sigma_X^2 \end{align*} \] where \(y' = \frac{\text{d}y}{\text{d}x}\Big\vert_\mu\).
\[ \sigma_Y^2 = E[(Y - y(\mu))^2] = E[Y^2] - y^2(\mu) = y'^2\sigma_X^2 \]
For independent RVs \(\{X_i\}\), where \(f(x_1, x_2, \dots) = \prod_i f_i(x_i)\). Then the function \(y(x_1, x_2, \dots)\) yields \[ \sigma_Y^2 = \sum_i {y'_i}^2\sigma_{X_i}^2 \]
Consider \(y = x_1 + x_2 + \dots + x_n\) with independent RVs \(\{X_i\}\). Error propagation tells us that \[ \sigma_Y^2 = \sum_i \sigma_{X_i}^2 \] The variance of a sum of independent RVs is the sum of the variances.
Consider the sample mean of a number of datapoints \(x_i\) from the same distribution with variance \(\sigma^2\): \(\bar x = \frac 1N \sum_i x_i\). We’re looking for the variance of the sample mean: \[ \begin{align*} \text{Var}[\bar{X}] &= \text{Var}[\frac{X_1 + X_2 + \dots + X_N}N] \\ &= \frac 1{N^2} (\text{Var}[X_1]+ \text{Var}[X_2]+ \dots + \text{Var}[X_N]). \end{align*} \] So the sample variance of the sample mean is \[ \sigma_{\bar x}^2 = \frac 1{N^2} \sum_i \sigma^2 = \frac{\sigma^2}{N} \] The variance of the mean is smaller than the variance of each individual data point by a factor \(1/N\).
Given an RV \(X\), what can we say about the probability of observing the event \(P(X \leq x_0)\) or \(P(X \geq x_0)\)?
If we know the density (or mass) function up to that point, it’s easy \[ P(X \leq x_0) = \int_{-\infty}^{\color{red}{x_0}} \text{d}u \, f(u) \]
What if we don’t know \(f(x)\)?
Let’s assume we don’t know \(f(x)\), but we do know the first two moments, \(\mu\) and \(\sigma^2\).
Chebyshev’s inequality
\[ P\left(|X-\mu| \geq k\sigma\right) \leq \frac 1{k^2} \]
Decompose the variance \[ \begin{align} \text{Var}[X] =& \int_{\color{red}{-\infty}}^{\color{red}{\mu-k\sigma}} \text{d}s\, (s-\mu)^2 f(s) \\ &+ \int_{\color{red}{\mu-k\sigma}}^{\color{red}{\mu+k\sigma}} \text{d}s\, (s-\mu)^2 f(s) \\ &+ \int_{\color{red}{\mu+k\sigma}}^{\color{red}{\infty}} \text{d}s\, (s-\mu)^2 f(s) \\ \end{align} \]
Let’s sum all three contributions \[ \text{Var}[X] = \sigma^2 \geq (k\sigma)^2 \left( \int_{-\infty}^{\mu-k\sigma} \text{d}s\, f(s) + \int_{\mu+k\sigma}^{\infty} \text{d}s\, f(s) \right) \]
\[ \frac 1{k^2} \geq P(X \leq \mu - k\sigma) + P(X \geq \mu + k\sigma) = P(\vert X-\mu \vert \geq k\sigma) \]
Note
Chebyshev’s inequality is useful to establish some loose bounds on the probability for an event. Useful when the underlying distribution is unkown / difficult to evalute.
Statement1
Given two distributions \(Y\) and \(Z\) with identical sets of moments: \[ \begin{align*} \mu'_{1Y} =& \mu'_{1Z}\\ \mu'_{2Y} =& \mu'_{2Z}\\ \mu'_{3Y} =& \mu'_{3Z}\\ &\vdots \end{align*} \] the distributions are identical.
Definition
The moment-generating function \(m(t)\) for an RV \(Y\) is defined as \[ m_Y(t) = E[\text{e}^{tY}] \]
\(m(t)\) is said to exist if it is finite for some \(|t| \leq b\) where \(b\) is a positive constant.
\[ m_Y(t) = E[\text{e}^{tY}] = \sum_y \text{e}^{ty} p(y) \]
\[ m_Y(t) = E[\text{e}^{tY}] = \int \text{d}y \, \text{e}^{ty} f(y) \]
Why is it called moment-generating function? Consider the series expansion of \(\text{e}^{tY}\) \[ \text{e}^{ty} = 1 + ty + \frac{(ty)^2}{2!} + \frac{(ty)^3}{3!} + \dots \]
Assuming that \(\mu'_k\) is finite for \(k=1,2,3,\dots\), we have \[ E[\text{e}^{tY}] = \sum_y \text{e}^{ty} p(y) = \sum_y \left[ 1 + ty + \frac{(ty)^2}{2!} + \frac{(ty)^3}{3!} + \dots \right] p(y) \]
\[ \begin{align} m_Y(t) &= E[\text{e}^{tY}] = \sum_y \left[ 1 + ty + \frac{(ty)^2}{2!} + \frac{(ty)^3}{3!} + \dots \right] p(y) \\ &= \sum_y p(y) + t\sum_y p(y) + \frac{t^2}{2!} \sum_y y^2 p(y) + \frac{t^3}{3!} \sum_y y^3 p(y) + \dots \\ &= 1 + tE[Y] + \frac{t^2}{2!} E[Y^2] + \frac{t^3}{3!} E[Y^3] + \dots \end{align} \]
\[ \boxed{\left.\frac{\text{d}^l m_X(t)}{dt^l}\right|_{t=0} = E[X^l]} \]
MGF for central moments
\[ m_{X-\mu}(t) = E[\text{e}^{t(X-\mu)}] = \int \text{d}x \, \text{e}^{t(x-\mu)} f(x) \]
MGF for sum of RVs
Given two independent RVs, \(X\) and \(Y\), distributed as \(f(x)\) and \(g(y)\), the MGF of the sum yields \[ \begin{align*} m_{X+Y}(t) =& E[\text{e}^{t(X+Y)}] \\ =& \int \text{d}x\text{d}y \, \text{e}^{t(x+y)} f(x)g(y) \\ =& \int \text{d}x \, \text{e}^{tx} f(x) \int \text{d}y \, \text{e}^{ty} g(y) \\ =& m_X(t)m_Y(t) \end{align*} \]
The MGF of the sum of two independent RVs is the product of the individual MGFs.
\[ \begin{align} E[\text{e}^{kX}] &= \sum_{x=0}^\infty \left( \frac{\lambda^x}{x!}\text{e}^{-\lambda} \right) \text{e}^{kx} \\ &= \text{e}^{-\lambda} \sum_{x=0}^\infty \frac{(\lambda \text{e}^k)^x}{x!} \\ &= \text{e}^{-\lambda} \text{e}^{\lambda \text{e}^k} \\ &= \text{e}^{\lambda(\text{e}^k-1)} \end{align} \]
The first two moments about the origin are thus \[ \begin{align} E[x] &= \frac{\text{d}}{\text{d}k}[\text{e}^{\lambda(\text{e}^k-1)}] = \text{e}^{\lambda(\text{e}^k-1)}(\lambda \text{e}^k) \\ E[x^2] &= \frac{\text{d}^2}{\text{d}k^2}[\text{e}^{\lambda(\text{e}^k-1)}] = \text{e}^{\lambda(\text{e}^k-1)} [(\lambda \text{e}^k)^2 + \lambda \text{e}^k] \end{align} \]
\[ \begin{align} \mu &= \left.\text{e}^{\lambda(\text{e}^k-1)}(\lambda \text{e}^k)\right|_{k=0} = \color{red}\lambda\\ \mu'_2 &= \left.\text{e}^{\lambda(\text{e}^k-1)} [(\lambda \text{e}^k)^2 + \lambda \text{e}^k]\right|_{k=0} = \lambda^2 + \lambda \\ \sigma^2 &= E[Y^2]-\mu^2 = \mu'_2 - \mu^2 = \color{red}\lambda \end{align} \]
Consider \(n\) iid RVs with mean \(\mu\) and variance \(\sigma^2 < \infty\) and unknown probability distribution function.
Central limit theorem
Define the RV \[ Y := \frac{\sum_{i=1}^n x_i - n\mu}{\sigma \sqrt n} = \frac{\hat x - \mu}{\sigma / \sqrt{n}}. \] \(Y\) tends to a standard normal distribution, \(Y\sim\mathcal{N}(0,1)\), for \(n\to\infty\).
Define the normal variables \(z_i = \frac{x_i - \mu}\sigma\), s.t. \(\langle z_i \rangle = 0\) and \(\langle z_i^2 \rangle = 1\). \[ Y = \frac 1{\sqrt{n}} \sum_i z_i \]
The moment-generating function of \(Y\) is \[ m_Y(t) = \langle \text{e}^{Yt} \rangle = \langle \text{e}^{z_it / \sqrt{n}} \rangle^n \]
\[ \begin{align*} m_Y(t) &= \langle \text{e}^{Yt} \rangle = \langle \text{e}^{z_it / \sqrt{n}} \rangle^n \\ &= \langle 1 + \frac{z_it}{\sqrt{n}} + \frac{z_i^2t^2}{2!n} + \dots \rangle^n \\ &= \left(1 + \frac{\langle z_i\rangle t}{\sqrt{n}} + \frac{\langle z_i^2\rangle t^2}{2!n} + \dots \right)^n \\ &= \left(1 + \frac{t^2}{2n} + \dots \right)^n \approx 1 + \frac {t^2}2 \approx \text{e}^{\frac 12 t^2} \end{align*} \] where we assume \(t \ll 1\), i.e., near the origin. We obtain the MGF of a Normal variable.
MGF for a normal Gaussian distribution (blue thick line); normalized sum of independent uniform RVs (red thin lines) for \(n=1,3,5\) from bottom up. (Fig 2.6 in Amendola (2021))
CLT guarantees that if the errors in a measure are the results of many independent errors due to various parts of the experiment, then they are expected to be distributed in a Gaussian way.
Joint distribution function
\[ F(x_1, x_2) = \int_{-\infty}^{x_1} \text{d}t_1 \int_{-\infty}^{x_2} \text{d}t_2 \, f(t_1, t_2) \]
We have a vector of expectations and a \(K\times K\) matrix of variances and covariances \[ \begin{align} E[X_i] &= \pi_i \\ \text{Var}[X_i] &= \pi_i ( 1 - \pi_i)\\ \text{Cov}[X_i, X_j] &= -\pi_i \pi_j \end{align} \]
Expected values, variances, and covariances1: \[ \begin{align} E[X_i] &= N\pi \\ \text{Var}[X_i] &= N\pi_i(1-\pi_i) \\ \text{Cov}[X_i, X_j] &= -N\pi_i \pi_j, \ \text{for}\ i \neq j \end{align} \]