Computational Statistics & Data Analysis (MVComp2)

Lecture 2: Random variables

Tristan Bereau

Institute for Theoretical Physics, Heidelberg University

Introduction

Literature

Today’s lecture
Based on: Ch. 2-6 in Wackerly, Mendenhall, and Scheaffer (2014)

Recap from last time

  • Many scientific problems are probabilistic
  • Concept/philosophy: frequentist vs. subjective (Bayesian)
  • Mathematical framework: Probability space triplet \((\Omega, \mathcal{F}, \mathbb{P})\): sample space, event space, probability measure
  • Operations on sets: Union, intersection, complement
  • Probability laws: conditional probability, additive law, law of total probability, Bayes’ rule

Random variable, distribution function, and density

Concept of a random variable

Random variable
function \(X: \Omega \to \mathbb{R}, \mathbb{Z}, \mathbb{N}, \dots\) on the sample space, i.e., mapping events or outcomes in sample space onto numbers. Can be discrete or continuous.

Distribution function

Let \(Y\) denote a random variable (RV). The distribution function of \(Y\), denoted by \(F(y)\), is such that \[ F(y) = P(Y \leq y), \quad -\infty < y < \infty. \]

\(F(y)\) can be either discrete or continuous.

Properties of a distribution function

If \(F(y)\) is a distribution function, then

  1. \(F(-\infty) = \lim_{y \to -\infty} F(y) = 0\)
  2. \(F(\infty) = \lim_{y \to \infty} F(y) = 1\)
  3. \(F(y)\) is a nondecreasing function of \(y\). (For any \(y_1 < y_2\), then \(F(y_1) < F(y_2)\)).

Distribution function

Discrete case

Distribution functions for discrete RVs are always step functions because the cumulative distribution function increases only at the finite or countable number of points with positive probabilities.

Continuous case

Distribution functions for continuous RVs are continuous functions themselves.

Probability density

Let \(F(y)\) be the distribution function for a continuous RV \(Y\). Then \(f(y)\), given by \[ f(y) = \frac{\text{d}F(y)}{\text{d}y} = F'(y), \] wherever the derivative exists, is called the probability density function.

Properties

If \(f(y)\) is a density function for a continuous RV, then

  1. \(f(y) \geq 0\) for all \(y\), \(-\infty < y < \infty\).
  2. \(\int_{-\infty}^\infty \text{d}y\, f(y) = 1\).

The density function need not be continuous

Suppose that \[ F(y) = \left\{ \begin{align*} 0&,\quad \text{for } y < 0,\\ y&,\quad \text{for } 0 \leq y \leq 1,\\ 1&,\quad \text{for } y > 1. \end{align*} \right. \] Find the probability density function for \(Y\).

Multivariate probability distributions

For any RVs \(Y_1\) and \(Y_2\), the joint (bivariate) distribution function \(F(y_1, y_2)\) is \[ \begin{align*} F(y_1, y_2) = P(Y_1 \leq y_1, Y_2 \leq y_2), \\ -\infty < y_1, \infty, -\infty < y_2, \infty \end{align*} \]

For continuous RVs \(Y_1\) and \(Y_2\) with joint distribution function \(F(y_1, y_2)\), there exists a nonnegative function \(f(y_1, y_2)\) s.t. \[ F(y_1, y_2) = \int_{-\infty}^{y_1} \text{d}t_1 \int_{-\infty}^{y_2}\text{d}t_2 \, f(t_1, t_2) \]

Moments: expectations and variance

Expected value

Let \(X\) be a random variable with discrete probability distribution \(P(X=x_i) = P(x_i)\)

\[ E[X] := \sum_{i=1}^\infty P(X=x_i)x_i \]

Let \(X\) be a random variable with continuous probability density \(f(x)\)

\[ E[X] := \int_{-\infty}^\infty {\rm d}x\, f(x)x \]

Expected value

  • \(E[X]\) provided sum/integral exists, i.e., converges! Not the case for all probability distributions!
  • We often denote expected value the population mean, \(\mu := E[X]\)

Example: expected value

\(y\) \(p(y)\)
0 1/4
1 1/2
2 1/4

To show that \(E[Y]\) is the population mean, draw samples from \(p(Y)\) \(10^6\) times. Expect \(Y=0\) roughly \(2.5 \times 10^5\) times, \(Y=1\) roughly \(5 \times 10^5\) times, and \(Y=2\) roughly \(2.5 \times 10^5\) times.

\[ \begin{align*} \mu \approx \frac 1n \sum_{i=1}^n y_i &= \frac{2.5 \times 0 + 5 \times 1 + 2.5 \times 2}{10} \\ &= (0)(1/4) + (1)(1/2) + (2)(1/4) \\ &= \sum_{y=0}^2 yp(y) = 1. \end{align*} \]

Variance

Recall discrete probability distribution \(P(X=x_i) = P(x_i)\) with mean \(\mu\)

\[ \text{Var}[X] := E\left[(X-\mu)^2\right] = \sum_{i=1}^\infty P(X=x_i)(x_i-\mu)^2 \]

Recall continuous probability density \(f(x)\) with mean \(\mu\)

\[ \text{Var}[X] := E\left[(X-\mu)^2\right] = \int_{-\infty}^\infty {\rm d}x\, f(x)(x-\mu)^2 \]

Variance

  • \(\text{Var}[X]\) provided sum/integral exists, i.e., converges!
  • We often denote variance the population variance, \(\sigma^2 = \text{Var}[X]\)
  • \(\sigma = \sqrt{\text{Var}[X]}\) is called the standard deviation

Other expected values

If \(Y\) is a continuous RV, then the \(k^\text{th}\) moment about the origin is given by \[ \mu_k' = E[Y^k], \quad k=1,2,\dots \]

The \(k^\text{th}\) moment about the mean, or \(k^\text{th}\) central moment, is given by \[ \mu_k = E[(Y-\mu)^k], \quad k=1,2, \dots \]

Linearities of expectations

Linearity of expectations

Let \(\{ X_i \}, i=1,\dots,N\) be a set of (in)dependent random variables and \(\{c_i\}\) a set of constant coefficients, then \[ E\left[ \sum_{i=1}^N c_i X_i \right] = \sum_{i=1}^N c_i E[X_i] \]

where the same holds for the continuous case, replacing sums by integrals.

Linearity of expectations: proof

Consider 2 random variables \(X\) and \(Y\) \[ \begin{align} E[X+Y] &= \sum_x \sum_y (x+y)P(x, y) \\ &= \sum_x\sum_y \left(xP(x,y) + yP(x,y) \right) \\ &= \sum_x xP(x) \underbrace{\sum_yP(y|x)}_{1} + \sum_y yP(y) \underbrace{\sum_x P(x|y)}_1 \\ &=E[X] + E[Y] \end{align} \]

Linearity of expectations: notes

  • Straightforward to extend proof to coefficients \(\{c_i\}\)
  • Can extend to functions of random variables, \(g_i(X_i)\), s.t. \[ E\left[ \sum_{i=1}^N g_i(X_i) \right] = \sum_{i=1}^N E[g_i(X_i)] \]
  • Straightforward to extend to multivariate functions, \(g(X_1, \dots, X_k)\) of random variables
  • Likewise for continuous distributions, replacing sums by integrals and probabilities by densities

Linearity for independent variables

For two independent random variables \(X\) and \(Y\) \[ E\left[ g_1(X_1) g_2(X_2) \right] = E[g_1(X_1)] E[g_2(X_2)] \]

Proof
exploit that \(p(x_1, x_2) = p_1(x_1) p_2(x_2)\)

Expectations of discontinuous functions

Example

A petroleum retailer sells a random amount \(Y\) each day. Suppose \(Y\), measured in thousands of gallons, has the probability density function

\[ f(y) = \left\{ \begin{align*} \frac 38 y^2, \quad 0 \leq y \leq 2, \\ 0, \quad \text{elsewhere} \end{align*} \right. \]

The retailer’s profit turns out to be $100 for each 1,000 gallons sold if \(Y \leq 1\) and $40 extra per 1,000 gallons if \(Y>1\). Find the retailer’s expected profit for any given day.

Solution

Let \(g(Y)\) denote the retailer’s daily profit. Then \[ g(Y) = \left\{ \begin{align*} 100Y, \quad 0 \leq y \leq 1, \\ 140Y, \quad 1 < y \leq 2. \end{align*} \right. \]

The expected profit is given by the function of the random variable \[ \begin{align*} E[g(Y)] &= \int_{-\infty}^\infty \text{d}y\, g(y) f(y) \\ &= \int_0^1 \text{d}y 100y \, \left[\frac 38 y^2\right] + \int_1^2 \text{d}y 140y \, \left[\frac 38 y^2\right] \\ &= 206.25 \end{align*} \]

Conditional expectations

Conditional expectations

  • Discrete random variable:

\[ E[X | Y] := \sum_x p(x|y)x \]

  • Continuous random variable:

\[ E[X | Y] := \int_{-\infty}^\infty {\rm d}x\, f(x|y)x \]

Conditional expectations

  • Discrete random variable:

\[ E[X | Y] := \sum_x p(x|y)x \]

Relation to unconditional expectation1 2

\[ E[X] = E_Y\left[E_X[X|Y] \right] \]

Properties of variances

\[ E[(X-\mu)^2] = E[X^2] - \mu^2 \]

Proof:

\[ \begin{align} E[(X-\mu)^2] &= E[(X^2 - 2\mu X + \mu^2)] \\ &= E[X^2] - 2\mu E[X] + \mu^2 \\ &= E[X^2] - \mu^2 \\ &= E[X^2] - E[X]^2 \end{align} \]

Properties of variances

Constant shifts do not affect the variance:1 \[ \text{Var}[aX+b] = a^2 \text{Var}[X] \]

Furthermore:2

\[ \text{Var}[X] = E_Y\left[ \text{Var}[X|Y] \right] + \text{Var}\left[ E_X[X|Y] \right] \]

Covariance and correlation

Covariance

Dependence of two variables
Intuitively: two RVs \(Y_1\) and \(Y_2\), such that one of them, say \(Y_1\), either increases or decreases as \(Y_2\) changes.

Covariance

  • Discrete random variables

\[ \begin{align} \text{Cov}[X, Y] &:= E[(X-\mu_x)(Y-\mu_y)] \\ &= \sum_x \sum_y(x-\mu_x)(y-\mu_y)p(x, y) \end{align} \]

  • Continuous random variables

\[ \text{Cov}[X, Y] := \int_{-\infty}^\infty \int_{-\infty}^\infty {\rm d}x {\rm d}y \, (x-\mu_x)(y-\mu_y)f(x, y) \]

Correlation coefficient

Covariance
\[ \text{Cov}[X, Y] := E[(X-\mu_x)(Y-\mu_y)] \]
Correlation coefficient1
\[\rho(X, Y) := \frac{\text{Cov}[X,Y]}{\sigma_X \sigma_Y} \in [-1,1]\]

Properties of covariance

  • \(\text{Cov}[X,Y] = E[XY] - E[X]E[Y]\) (analog. to Var)
  • \(\text{Cov}[X, Y] = 0\) for independent random variables1
  • \(\text{Var}[X+Y] = \text{Var}[X] + \text{Var}[Y] + 2\text{Cov}[X,Y]\)
    • Independent RVs: \(\text{Var}[\sum_i X_i] = \sum_i \text{Var}[X_i]\)
  • \(\text{Cov}[aX+b, cY+d] = ac\text{Cov}[X, Y]\)
  • \(\text{Cov}[X+Z, Y] = \text{Cov}[X,Y] + \text{Cov}[Z, Y]\)
  • \(\text{Cov}[X, Y] = \text{Cov}[Y, X]\)

Independence versus zero covariance

Example

Are \(Y_1\) and \(Y_2\) independent?

Check whether \(p(y_1, y_2) = p(y_1)p(y_2)\). For instance focus on \((0, 0)\).

Marginalize \(p\) into \(p_1\) and \(p_2\), e.g., \[ p_1(y_1) = \sum_{\text{all } y_2} p(y_1, y_2) \]

\[ p_1(0) = p_2(0) = \frac 6{16} \]

But \(p(0, 0) = 0\). So \(Y_1\) and \(Y_2\) are dependent.

Independence versus zero covariance

Example

What is the covariance of \(Y_1\) and \(Y_2\)?

First, note that \(E[Y_1] = E[Y_2] = 0\).

\[ \begin{align*} E[Y_1 Y_2] =& \sum_{\text{all }y_1} \sum_{\text{all }y_2} y_1 y_2 p(y_1, y_2) \\ =& (-1)(-1)(1/16) \\ &+ (-1)(0)(3/16) + \dots \\ =& 0 \end{align*} \] If the covariance of two RVs is zero, the variables need not be independent.

Covariance matrix

For multivariate distributions, we frequently encounter the covariance matrix

Covariance matrix
collects all pairwise covariances in a matrix \(\Sigma_{ij} = \text{Cov}[X_i, X_j]\) between random variables \(X_i\) and \(X_j\). Diagonal elements: \(\Sigma_{ii} = \text{Cov}[X_i, X_i] = \text{Var}[X_i]\).

Covariance matrices are real and symmetric, thus only have real eigenvalues, have non-negative values on the diagonal, and are positive-semidefinite.1

Distribution functions

Distribution functions

Definition

If \(Y\) has probability density function \(f(y)\) and if \(U\) is some function of \(Y\), then we can find the cumulative distribution function \[ F_U(u) = P(U \leq u) \] by integrating \(f(y)\) over the region for which \(U \leq u\).

The probability density function is found by differentiation \[ f_u(u) = \frac{\text{d}F_U(u)}{\text{d}u} \]

The method of transformations

The method of transformations1

Consider the RV, \(Y\), with density function, \(h\), where \(h\) is either decreasing or increasing2 We can compute the density function \(U=h(Y)\) by a change of variable:

\[ h^{-1}(u) = y \]

Using the cumulative distributions: \[ \begin{align*} P(U \leq u) &= P[h(Y) \leq u] \\ &= P\{h^{-1}[h(Y)] \leq h^{-1}(u)\} \\ &= P[Y \leq h^{-1}(u)] \end{align*} \]

which is equivalent to \[ F_U(u) = F_Y[h^{-1}(u)]. \]

Differentiate wrt \(u\): \[ f_U(u) = \frac{\text{d}F_U(u)}{\text{d}u} = \frac{\text{d}F_Y[h^{-1}(u)]}{\text{d}u} = f_Y(h^{-1}(u))\frac{\text{d}[h^{-1}(u)]}{\text{d}u} \]

If \(h(y)\) is a decreasing function of \(y\), then the result is the same with an extra minus sign. Therefore, in general, \[ f_U(u) = f_Y(h^{-1}(u))\left\vert\frac{\text{d}[h^{-1}(u)]}{\text{d}u}\right\vert. \]

Sample analogues of expectations and variances

Random sample

Random sample
Sample of size \(n\) from a larger population of size \(N\) with each drawing being equally likely, i.e., each sample has the same probability. Typically ask for independence between samples.1

Sample mean and variance

Collect finite random sample \(\{ x_i \}, i=1, \dots, n\).

Sample mean
\[\overline x = \frac 1n \sum_{i=1}^n x_i\]
Sample variance1 2
\[s_x^2 = \frac 1n \sum_{i=1}^n (x_i - \overline x)^2\]

\[ \lim_{n\to\infty} \overline x = E[X]\]

\[ \lim_{n\to\infty} s_x^2 = \text{Var}[X]\]

Summary

Summary

  • Random variable maps event onto numbers
  • Def. of expected value (population mean) and variance for discrete and continuous probability distributions
  • Linearity of expectations makes combinations easy
  • Conditional expectations
  • Constant shift does not affect the variance
  • Covariance and correlation to relate two/many random variables
  • Sample mean and variance as empirical measures

References

Wackerly, Dennis, William Mendenhall, and Richard L Scheaffer. 2014. Mathematical Statistics with Applications. Cengage Learning.