Lecture 2: Random variables
Institute for Theoretical Physics, Heidelberg University
Let \(Y\) denote a random variable (RV). The distribution function of \(Y\), denoted by \(F(y)\), is such that \[ F(y) = P(Y \leq y), \quad -\infty < y < \infty. \]
\(F(y)\) can be either discrete or continuous.
Properties of a distribution function
If \(F(y)\) is a distribution function, then
Discrete case
Distribution functions for discrete RVs are always step functions because the cumulative distribution function increases only at the finite or countable number of points with positive probabilities.
Continuous case
Distribution functions for continuous RVs are continuous functions themselves.
Let \(F(y)\) be the distribution function for a continuous RV \(Y\). Then \(f(y)\), given by \[ f(y) = \frac{\text{d}F(y)}{\text{d}y} = F'(y), \] wherever the derivative exists, is called the probability density function.
If \(f(y)\) is a density function for a continuous RV, then
Suppose that \[ F(y) = \left\{ \begin{align*} 0&,\quad \text{for } y < 0,\\ y&,\quad \text{for } 0 \leq y \leq 1,\\ 1&,\quad \text{for } y > 1. \end{align*} \right. \] Find the probability density function for \(Y\).
If RV \(Y\) has a density function \(f(y)\) and \(a < b\), then the probability that \(Y\) falls in the interval \([a, b]\) is \[ P(a \leq Y \leq b) = \int_a^b \text{d}y\, f(y) \]
For any RVs \(Y_1\) and \(Y_2\), the joint (bivariate) distribution function \(F(y_1, y_2)\) is \[ \begin{align*} F(y_1, y_2) = P(Y_1 \leq y_1, Y_2 \leq y_2), \\ -\infty < y_1, \infty, -\infty < y_2, \infty \end{align*} \]
For continuous RVs \(Y_1\) and \(Y_2\) with joint distribution function \(F(y_1, y_2)\), there exists a nonnegative function \(f(y_1, y_2)\) s.t. \[ F(y_1, y_2) = \int_{-\infty}^{y_1} \text{d}t_1 \int_{-\infty}^{y_2}\text{d}t_2 \, f(t_1, t_2) \]
Let \(X\) be a random variable with discrete probability distribution \(P(X=x_i) = P(x_i)\)
\[ E[X] := \sum_{i=1}^\infty P(X=x_i)x_i \]
Let \(X\) be a random variable with continuous probability density \(f(x)\)
\[ E[X] := \int_{-\infty}^\infty {\rm d}x\, f(x)x \]
\(y\) | \(p(y)\) |
0 | 1/4 |
1 | 1/2 |
2 | 1/4 |
To show that \(E[Y]\) is the population mean, draw samples from \(p(Y)\) \(10^6\) times. Expect \(Y=0\) roughly \(2.5 \times 10^5\) times, \(Y=1\) roughly \(5 \times 10^5\) times, and \(Y=2\) roughly \(2.5 \times 10^5\) times.
\[ \begin{align*} \mu \approx \frac 1n \sum_{i=1}^n y_i &= \frac{2.5 \times 0 + 5 \times 1 + 2.5 \times 2}{10} \\ &= (0)(1/4) + (1)(1/2) + (2)(1/4) \\ &= \sum_{y=0}^2 yp(y) = 1. \end{align*} \]
Recall discrete probability distribution \(P(X=x_i) = P(x_i)\) with mean \(\mu\)
\[ \text{Var}[X] := E\left[(X-\mu)^2\right] = \sum_{i=1}^\infty P(X=x_i)(x_i-\mu)^2 \]
Recall continuous probability density \(f(x)\) with mean \(\mu\)
\[ \text{Var}[X] := E\left[(X-\mu)^2\right] = \int_{-\infty}^\infty {\rm d}x\, f(x)(x-\mu)^2 \]
If \(Y\) is a continuous RV, then the \(k^\text{th}\) moment about the origin is given by \[ \mu_k' = E[Y^k], \quad k=1,2,\dots \]
The \(k^\text{th}\) moment about the mean, or \(k^\text{th}\) central moment, is given by \[ \mu_k = E[(Y-\mu)^k], \quad k=1,2, \dots \]
Let \(\{ X_i \}, i=1,\dots,N\) be a set of (in)dependent random variables and \(\{c_i\}\) a set of constant coefficients, then \[ E\left[ \sum_{i=1}^N c_i X_i \right] = \sum_{i=1}^N c_i E[X_i] \]
where the same holds for the continuous case, replacing sums by integrals.
Consider 2 random variables \(X\) and \(Y\) \[ \begin{align} E[X+Y] &= \sum_x \sum_y (x+y)P(x, y) \\ &= \sum_x\sum_y \left(xP(x,y) + yP(x,y) \right) \\ &= \sum_x xP(x) \underbrace{\sum_yP(y|x)}_{1} + \sum_y yP(y) \underbrace{\sum_x P(x|y)}_1 \\ &=E[X] + E[Y] \end{align} \]
For two independent random variables \(X\) and \(Y\) \[ E\left[ g_1(X_1) g_2(X_2) \right] = E[g_1(X_1)] E[g_2(X_2)] \]
A petroleum retailer sells a random amount \(Y\) each day. Suppose \(Y\), measured in thousands of gallons, has the probability density function
\[ f(y) = \left\{ \begin{align*} \frac 38 y^2, \quad 0 \leq y \leq 2, \\ 0, \quad \text{elsewhere} \end{align*} \right. \]
The retailer’s profit turns out to be $100 for each 1,000 gallons sold if \(Y \leq 1\) and $40 extra per 1,000 gallons if \(Y>1\). Find the retailer’s expected profit for any given day.
Let \(g(Y)\) denote the retailer’s daily profit. Then \[ g(Y) = \left\{ \begin{align*} 100Y, \quad 0 \leq y \leq 1, \\ 140Y, \quad 1 < y \leq 2. \end{align*} \right. \]
The expected profit is given by the function of the random variable \[ \begin{align*} E[g(Y)] &= \int_{-\infty}^\infty \text{d}y\, g(y) f(y) \\ &= \int_0^1 \text{d}y 100y \, \left[\frac 38 y^2\right] + \int_1^2 \text{d}y 140y \, \left[\frac 38 y^2\right] \\ &= 206.25 \end{align*} \]
\[ E[X | Y] := \sum_x p(x|y)x \]
\[ E[X | Y] := \int_{-\infty}^\infty {\rm d}x\, f(x|y)x \]
\[ E[X | Y] := \sum_x p(x|y)x \]
Relation to unconditional expectation1 2
\[ E[X] = E_Y\left[E_X[X|Y] \right] \]
\[ E[(X-\mu)^2] = E[X^2] - \mu^2 \]
\[ \begin{align} E[(X-\mu)^2] &= E[(X^2 - 2\mu X + \mu^2)] \\ &= E[X^2] - 2\mu E[X] + \mu^2 \\ &= E[X^2] - \mu^2 \\ &= E[X^2] - E[X]^2 \end{align} \]
Constant shifts do not affect the variance:1 \[ \text{Var}[aX+b] = a^2 \text{Var}[X] \]
\[ \text{Var}[X] = E_Y\left[ \text{Var}[X|Y] \right] + \text{Var}\left[ E_X[X|Y] \right] \]
\[ \begin{align} \text{Cov}[X, Y] &:= E[(X-\mu_x)(Y-\mu_y)] \\ &= \sum_x \sum_y(x-\mu_x)(y-\mu_y)p(x, y) \end{align} \]
\[ \text{Cov}[X, Y] := \int_{-\infty}^\infty \int_{-\infty}^\infty {\rm d}x {\rm d}y \, (x-\mu_x)(y-\mu_y)f(x, y) \]
Are \(Y_1\) and \(Y_2\) independent?
Check whether \(p(y_1, y_2) = p(y_1)p(y_2)\). For instance focus on \((0, 0)\).
Marginalize \(p\) into \(p_1\) and \(p_2\), e.g., \[ p_1(y_1) = \sum_{\text{all } y_2} p(y_1, y_2) \]
\[ p_1(0) = p_2(0) = \frac 6{16} \]
But \(p(0, 0) = 0\). So \(Y_1\) and \(Y_2\) are dependent.
What is the covariance of \(Y_1\) and \(Y_2\)?
First, note that \(E[Y_1] = E[Y_2] = 0\).
\[ \begin{align*} E[Y_1 Y_2] =& \sum_{\text{all }y_1} \sum_{\text{all }y_2} y_1 y_2 p(y_1, y_2) \\ =& (-1)(-1)(1/16) \\ &+ (-1)(0)(3/16) + \dots \\ =& 0 \end{align*} \] If the covariance of two RVs is zero, the variables need not be independent.
For multivariate distributions, we frequently encounter the covariance matrix
Covariance matrices are real and symmetric, thus only have real eigenvalues, have non-negative values on the diagonal, and are positive-semidefinite.1
If \(Y\) has probability density function \(f(y)\) and if \(U\) is some function of \(Y\), then we can find the cumulative distribution function \[ F_U(u) = P(U \leq u) \] by integrating \(f(y)\) over the region for which \(U \leq u\).
The probability density function is found by differentiation \[ f_u(u) = \frac{\text{d}F_U(u)}{\text{d}u} \]
Consider the RV, \(Y\), with density function, \(h\), where \(h\) is either decreasing or increasing2 We can compute the density function \(U=h(Y)\) by a change of variable:
\[ h^{-1}(u) = y \]
Using the cumulative distributions: \[ \begin{align*} P(U \leq u) &= P[h(Y) \leq u] \\ &= P\{h^{-1}[h(Y)] \leq h^{-1}(u)\} \\ &= P[Y \leq h^{-1}(u)] \end{align*} \]
which is equivalent to \[ F_U(u) = F_Y[h^{-1}(u)]. \]
Differentiate wrt \(u\): \[ f_U(u) = \frac{\text{d}F_U(u)}{\text{d}u} = \frac{\text{d}F_Y[h^{-1}(u)]}{\text{d}u} = f_Y(h^{-1}(u))\frac{\text{d}[h^{-1}(u)]}{\text{d}u} \]
If \(h(y)\) is a decreasing function of \(y\), then the result is the same with an extra minus sign. Therefore, in general, \[ f_U(u) = f_Y(h^{-1}(u))\left\vert\frac{\text{d}[h^{-1}(u)]}{\text{d}u}\right\vert. \]
Collect finite random sample \(\{ x_i \}, i=1, \dots, n\).
\[ \lim_{n\to\infty} \overline x = E[X]\]
\[ \lim_{n\to\infty} s_x^2 = \text{Var}[X]\]