Computational Statistics & Data Analysis (MVComp2)

Lecture 2: Random variables

Tristan Bereau

Institute for Theoretical Physics, Heidelberg University

Advertisement from Dmitry Kobak

Introduction

Literature

Today’s lecture: Based on: Ch. 2-6 in Wackerly, Mendenhall, and Scheaffer (2014)

Recap from last time

Many scientific problems are probabilistic
Concept/philosophy: frequentist vs. subjective (Bayesian)
Mathematical framework: Probability space triplet $(\Omega, \mathcal{F}, \mathbb{P})$: sample space, event space, probability measure
Operations on sets: Union, intersection, complement
Probability laws: conditional probability, additive law, law of total probability, Bayes’ rule

Random variable, distribution function, and density

Concept of a random variable

Random variable: function $X: \Omega \to \mathbb{R}, \mathbb{Z}, \mathbb{N}, \dots$ on the sample space, i.e., mapping events or outcomes in sample space onto numbers. Can be discrete or continuous.

Distribution function

Let $Y$ denote a random variable (RV). The distribution function of $Y$, denoted by $F(y)$, is such that \[ F(y) = P(Y \leq y), \quad -\infty < y < \infty. \]

$F(y)$ can be either discrete or continuous.

Properties of a distribution function

If $F(y)$ is a distribution function, then

$F(-\infty) = \lim_{y \to -\infty} F(y) = 0$
$F(\infty) = \lim_{y \to \infty} F(y) = 1$
$F(y)$ is a nondecreasing function of $y$. (For any $y_1 < y_2$, then $F(y_1) < F(y_2)$).

Distribution function

Discrete case

Distribution functions for discrete RVs are always step functions because the cumulative distribution function increases only at the finite or countable number of points with positive probabilities.

Continuous case

Distribution functions for continuous RVs are continuous functions themselves.

Probability density

Let $F(y)$ be the distribution function for a continuous RV $Y$. Then $f(y)$, given by \[ f(y) = \frac{\text{d}F(y)}{\text{d}y} = F'(y), \] wherever the derivative exists, is called the probability density function.

Properties

If $f(y)$ is a density function for a continuous RV, then

$f(y) \geq 0$ for all $y$, $-\infty < y < \infty$.
$\int_{-\infty}^\infty \text{d}y\, f(y) = 1$.

The density function need not be continuous

Suppose that \[ F(y) = \left\{ \begin{align*} 0&,\quad \text{for } y < 0,\\ y&,\quad \text{for } 0 \leq y \leq 1,\\ 1&,\quad \text{for } y > 1. \end{align*} \right. \] Find the probability density function for $Y$.

Link to the probability

If RV $Y$ has a density function $f(y)$ and $a < b$, then the probability that $Y$ falls in the interval $[a, b]$ is \[ P(a \leq Y \leq b) = \int_a^b \text{d}y\, f(y) \]

Multivariate probability distributions

For any RVs $Y_1$ and $Y_2$, the joint (bivariate) distribution function $F(y_1, y_2)$ is \[ \begin{align*} F(y_1, y_2) = P(Y_1 \leq y_1, Y_2 \leq y_2), \\ -\infty < y_1, \infty, -\infty < y_2, \infty \end{align*} \]

For continuous RVs $Y_1$ and $Y_2$ with joint distribution function $F(y_1, y_2)$, there exists a nonnegative function $f(y_1, y_2)$ s.t. \[ F(y_1, y_2) = \int_{-\infty}^{y_1} \text{d}t_1 \int_{-\infty}^{y_2}\text{d}t_2 \, f(t_1, t_2) \]

Moments: expectations and variance

Expected value

Let $X$ be a random variable with discrete probability distribution $P(X=x_i) = P(x_i)$

\[ E[X] := \sum_{i=1}^\infty P(X=x_i)x_i \]

Let $X$ be a random variable with continuous probability density $f(x)$

\[ E[X] := \int_{-\infty}^\infty {\rm d}x\, f(x)x \]

Expected value

$E[X]$ provided sum/integral exists, i.e., converges! Not the case for all probability distributions!
We often denote expected value the population mean, $\mu := E[X]$

Example: expected value

$y$	$p(y)$
0	1/4
1	1/2
2	1/4

To show that $E[Y]$ is the population mean, draw samples from $p(Y)$ $10^6$ times. Expect $Y=0$ roughly $2.5 \times 10^5$ times, $Y=1$ roughly $5 \times 10^5$ times, and $Y=2$ roughly $2.5 \times 10^5$ times.

\[ \begin{align*} \mu \approx \frac 1n \sum_{i=1}^n y_i &= \frac{2.5 \times 0 + 5 \times 1 + 2.5 \times 2}{10} \\ &= (0)(1/4) + (1)(1/2) + (2)(1/4) \\ &= \sum_{y=0}^2 yp(y) = 1. \end{align*} \]

Variance

Recall discrete probability distribution $P(X=x_i) = P(x_i)$ with mean $\mu$

\[ \text{Var}[X] := E\left[(X-\mu)^2\right] = \sum_{i=1}^\infty P(X=x_i)(x_i-\mu)^2 \]

Recall continuous probability density $f(x)$ with mean $\mu$

\[ \text{Var}[X] := E\left[(X-\mu)^2\right] = \int_{-\infty}^\infty {\rm d}x\, f(x)(x-\mu)^2 \]

Variance

$\text{Var}[X]$ provided sum/integral exists, i.e., converges!
We often denote variance the population variance, $\sigma^2 = \text{Var}[X]$
$\sigma = \sqrt{\text{Var}[X]}$ is called the standard deviation

Other expected values

If $Y$ is a continuous RV, then the $k^\text{th}$ moment about the origin is given by \[ \mu_k' = E[Y^k], \quad k=1,2,\dots \]

The $k^\text{th}$ moment about the mean, or $k^\text{th}$ central moment, is given by \[ \mu_k = E[(Y-\mu)^k], \quad k=1,2, \dots \]

Linearities of expectations

Linearity of expectations

Let $\{ X_i \}, i=1,\dots,N$ be a set of (in)dependent random variables and $\{c_i\}$ a set of constant coefficients, then \[ E\left[ \sum_{i=1}^N c_i X_i \right] = \sum_{i=1}^N c_i E[X_i] \]

where the same holds for the continuous case, replacing sums by integrals.

Linearity of expectations: proof

Consider 2 random variables $X$ and $Y$ \[ \begin{align} E[X+Y] &= \sum_x \sum_y (x+y)P(x, y) \\ &= \sum_x\sum_y \left(xP(x,y) + yP(x,y) \right) \\ &= \sum_x xP(x) \underbrace{\sum_yP(y|x)}_{1} + \sum_y yP(y) \underbrace{\sum_x P(x|y)}_1 \\ &=E[X] + E[Y] \end{align} \]

Linearity of expectations: notes

Straightforward to extend proof to coefficients $\{c_i\}$
Can extend to functions of random variables, $g_i(X_i)$, s.t. \[ E\left[ \sum_{i=1}^N g_i(X_i) \right] = \sum_{i=1}^N E[g_i(X_i)] \]
Straightforward to extend to multivariate functions, $g(X_1, \dots, X_k)$ of random variables
Likewise for continuous distributions, replacing sums by integrals and probabilities by densities

Linearity for independent variables

For two independent random variables $X$ and $Y$ \[ E\left[ g_1(X_1) g_2(X_2) \right] = E[g_1(X_1)] E[g_2(X_2)] \]

Proof: exploit that $p(x_1, x_2) = p_1(x_1) p_2(x_2)$

Expectations of discontinuous functions

Example

A petroleum retailer sells a random amount $Y$ each day. Suppose $Y$, measured in thousands of gallons, has the probability density function

\[ f(y) = \left\{ \begin{align*} \frac 38 y^2, \quad 0 \leq y \leq 2, \\ 0, \quad \text{elsewhere} \end{align*} \right. \]

The retailer’s profit turns out to be $100 for each 1,000 gallons sold if $Y \leq 1$ and $40 extra per 1,000 gallons if $Y>1$. Find the retailer’s expected profit for any given day.

Solution

Let $g(Y)$ denote the retailer’s daily profit. Then \[ g(Y) = \left\{ \begin{align*} 100Y, \quad 0 \leq y \leq 1, \\ 140Y, \quad 1 < y \leq 2. \end{align*} \right. \]

The expected profit is given by the function of the random variable \[ \begin{align*} E[g(Y)] &= \int_{-\infty}^\infty \text{d}y\, g(y) f(y) \\ &= \int_0^1 \text{d}y 100y \, \left[\frac 38 y^2\right] + \int_1^2 \text{d}y 140y \, \left[\frac 38 y^2\right] \\ &= 206.25 \end{align*} \]

Conditional expectations

Discrete random variable:

\[ E[X | Y] := \sum_x p(x|y)x \]

Continuous random variable:

\[ E[X | Y] := \int_{-\infty}^\infty {\rm d}x\, f(x|y)x \]

Conditional expectations

Discrete random variable:

\[ E[X | Y] := \sum_x p(x|y)x \]

Relation to unconditional expectation¹ ²

\[ E[X] = E_Y\left[E_X[X|Y] \right] \]

Properties of variances

\[ E[(X-\mu)^2] = E[X^2] - \mu^2 \]

Proof:

\[ \begin{align} E[(X-\mu)^2] &= E[(X^2 - 2\mu X + \mu^2)] \\ &= E[X^2] - 2\mu E[X] + \mu^2 \\ &= E[X^2] - \mu^2 \\ &= E[X^2] - E[X]^2 \end{align} \]

Properties of variances

Constant shifts do not affect the variance:¹ \[ \text{Var}[aX+b] = a^2 \text{Var}[X] \]

Furthermore:²

\[ \text{Var}[X] = E_Y\left[ \text{Var}[X|Y] \right] + \text{Var}\left[ E_X[X|Y] \right] \]

Covariance and correlation

Covariance

Dependence of two variables: Intuitively: two RVs $Y_1$ and $Y_2$, such that one of them, say $Y_1$, either increases or decreases as $Y_2$ changes.

Covariance

Discrete random variables

\[ \begin{align} \text{Cov}[X, Y] &:= E[(X-\mu_x)(Y-\mu_y)] \\ &= \sum_x \sum_y(x-\mu_x)(y-\mu_y)p(x, y) \end{align} \]

Continuous random variables

\[ \text{Cov}[X, Y] := \int_{-\infty}^\infty \int_{-\infty}^\infty {\rm d}x {\rm d}y \, (x-\mu_x)(y-\mu_y)f(x, y) \]

Correlation coefficient

Covariance: \[ \text{Cov}[X, Y] := E[(X-\mu_x)(Y-\mu_y)] \]
Correlation coefficient¹: \[\rho(X, Y) := \frac{\text{Cov}[X,Y]}{\sigma_X \sigma_Y} \in [-1,1]\]

Properties of covariance

$\text{Cov}[X,Y] = E[XY] - E[X]E[Y]$ (analog. to Var)
$\text{Cov}[X, Y] = 0$ for independent random variables¹
$\text{Var}[X+Y] = \text{Var}[X] + \text{Var}[Y] + 2\text{Cov}[X,Y]$
- Independent RVs: $\text{Var}[\sum_i X_i] = \sum_i \text{Var}[X_i]$
$\text{Cov}[aX+b, cY+d] = ac\text{Cov}[X, Y]$
$\text{Cov}[X+Z, Y] = \text{Cov}[X,Y] + \text{Cov}[Z, Y]$
$\text{Cov}[X, Y] = \text{Cov}[Y, X]$

Independence versus zero covariance

Example

Are $Y_1$ and $Y_2$ independent?

Check whether $p(y_1, y_2) = p(y_1)p(y_2)$. For instance focus on $(0, 0)$.

Marginalize $p$ into $p_1$ and $p_2$, e.g., \[ p_1(y_1) = \sum_{\text{all } y_2} p(y_1, y_2) \]

\[ p_1(0) = p_2(0) = \frac 6{16} \]

But $p(0, 0) = 0$. So $Y_1$ and $Y_2$ are dependent.

Independence versus zero covariance

Example

What is the covariance of $Y_1$ and $Y_2$?

First, note that $E[Y_1] = E[Y_2] = 0$.

\[ \begin{align*} E[Y_1 Y_2] =& \sum_{\text{all }y_1} \sum_{\text{all }y_2} y_1 y_2 p(y_1, y_2) \\ =& (-1)(-1)(1/16) \\ &+ (-1)(0)(3/16) + \dots \\ =& 0 \end{align*} \] If the covariance of two RVs is zero, the variables need not be independent.

Covariance matrix

For multivariate distributions, we frequently encounter the covariance matrix

Covariance matrix: collects all pairwise covariances in a matrix $\Sigma_{ij} = \text{Cov}[X_i, X_j]$ between random variables $X_i$ and $X_j$. Diagonal elements: $\Sigma_{ii} = \text{Cov}[X_i, X_i] = \text{Var}[X_i]$.

Covariance matrices are real and symmetric, thus only have real eigenvalues, have non-negative values on the diagonal, and are positive-semidefinite.¹

Distribution functions

Definition

If $Y$ has probability density function $f(y)$ and if $U$ is some function of $Y$, then we can find the cumulative distribution function \[ F_U(u) = P(U \leq u) \] by integrating $f(y)$ over the region for which $U \leq u$.

The probability density function is found by differentiation \[ f_u(u) = \frac{\text{d}F_U(u)}{\text{d}u} \]

The method of transformations

The method of transformations¹

Consider the RV, $Y$, with density function, $h$, where $h$ is either decreasing or increasing² We can compute the density function $U=h(Y)$ by a change of variable:

\[ h^{-1}(u) = y \]

Using the cumulative distributions: \[ \begin{align*} P(U \leq u) &= P[h(Y) \leq u] \\ &= P\{h^{-1}[h(Y)] \leq h^{-1}(u)\} \\ &= P[Y \leq h^{-1}(u)] \end{align*} \]

which is equivalent to \[ F_U(u) = F_Y[h^{-1}(u)]. \]

Differentiate wrt $u$: \[ f_U(u) = \frac{\text{d}F_U(u)}{\text{d}u} = \frac{\text{d}F_Y[h^{-1}(u)]}{\text{d}u} = f_Y(h^{-1}(u))\frac{\text{d}[h^{-1}(u)]}{\text{d}u} \]

If $h(y)$ is a decreasing function of $y$, then the result is the same with an extra minus sign. Therefore, in general, \[ f_U(u) = f_Y(h^{-1}(u))\left\vert\frac{\text{d}[h^{-1}(u)]}{\text{d}u}\right\vert. \]

Sample analogues of expectations and variances

Random sample

Random sample: Sample of size $n$ from a larger population of size $N$ with each drawing being equally likely, i.e., each sample has the same probability. Typically ask for independence between samples.¹

Sample mean and variance

Collect finite random sample $\{ x_i \}, i=1, \dots, n$.

Sample mean: \[\overline x = \frac 1n \sum_{i=1}^n x_i\]

Sample variance¹ ²: \[s_x^2 = \frac 1n \sum_{i=1}^n (x_i - \overline x)^2\]

\[ \lim_{n\to\infty} \overline x = E[X]\]

\[ \lim_{n\to\infty} s_x^2 = \text{Var}[X]\]

Summary

Random variable maps event onto numbers
Def. of expected value (population mean) and variance for discrete and continuous probability distributions
Linearity of expectations makes combinations easy
Conditional expectations
Constant shift does not affect the variance
Covariance and correlation to relate two/many random variables
Sample mean and variance as empirical measures

References

Wackerly, Dennis, William Mendenhall, and Richard L Scheaffer. 2014. Mathematical Statistics with Applications. Cengage Learning.

Computational Statistics & Data Analysis (MVComp2)

Advertisement from Dmitry Kobak

Introduction

Literature

Recap from last time

Random variable, distribution function, and density

Concept of a random variable

Distribution function

Distribution function

Probability density

The density function need not be continuous

Link to the probability

Multivariate probability distributions

Moments: expectations and variance

Expected value

Expected value

Example: expected value

Variance

Variance

Other expected values

Linearities of expectations

Linearity of expectations

Linearity of expectations: proof

Linearity of expectations: notes

Linearity for independent variables

Expectations of discontinuous functions

Conditional expectations

Conditional expectations

Conditional expectations

Properties of variances

Properties of variances

Covariance and correlation

Covariance

Covariance

Correlation coefficient

Properties of covariance

Independence versus zero covariance

Independence versus zero covariance

Covariance matrix

Distribution functions

Distribution functions

The method of transformations

The method of transformations1

Sample analogues of expectations and variances

Random sample

Sample mean and variance

Summary

Summary

References

The method of transformations¹