Computational Statistics & Data Analysis (MVComp2)

Lecture 6: Parameter estimation and hypothesis tests

Tristan Bereau

Institute for Theoretical Physics, Heidelberg University



  • Chapters 5 and 6 in Amendola (2021)

Recap from last time

Statistical inference
Obtain estimates of unknown distribution parameters
Bias, variance, mean-squared error
Statistical parameter estimation
Least-squares error, maximum likelihood estimation, Bayesian inference
Sampling the posterior
Fisher matrix, Monte Carlo methods, Gradient descent and Newton–Raphson

Parameter estimation

Frequentist approach

We momentarily focus on the frequentist approach to explore some results. Motivations:

  1. Many studies (still) adopt a frequentist methodology, so it’s useful to understand them
  2. In some cases it may be difficult to choose (or agree on) a prior

Distribution of the sample mean

We have \(N\) data \(x_i\) assumed to be Gaussian distributed (iid), \(\mathcal{N}\left(\mu, \sigma\right)\). The sample mean statistic \[ \hat x = \frac 1N \sum_i x_i \] will also be a Gaussian variable: \(\mathcal{N}\left(\mu, \frac{\sigma}{\sqrt N}\right)\) (from the CLT).1

Toward a distribution of the variance

Chi-squared distribution

Consider \(N\) iid Gaussian variables, we define \[ z = \sum_{i=1}^n \frac{(x_i - \mu_i)^2}{\sigma_i^2}. \] To find the PDF of \(z\) we do a variable transformation to spherical coordinates in \(N\) dimensions: a radius \(r^2 = z\) and \(N-1\) angles \(\theta_i\). Integration over the radial and angular components leads to a \(\chi^2\) distribution of the form1 \[ f(z; N) = \frac 1{2^{N/2}\Gamma(N/2)}z^{N/2}\text{e}^{-z/2} = \chi_N^2(z). \] Importantly, the first two moments are given by \[ \begin{align*} E[Z] &= N \\ \text{Var}[Z] &= 2N \end{align*} \]

Distribution of the sample variance

What is the distribution of the sample variance \(S^2 = \frac 1{N-1}\sum_{i=1}^N (x_i - \hat x)^2\)?

We have \[ \begin{align*} (N-1)S^2 &= \sum_{i=1}^N \left[ (x_i - \mu) - (\hat x - \mu) \right]^2 \\ &= \sum_{i=1}^N (x_i - \mu)^2 - N(\hat x - \mu)^2 \\ (N-1)\frac{S^2}{\sigma^2} &= \underbrace{\sum_{i=1}^N \frac{(x_i - \mu)^2}{\sigma^2}}_{\chi^2_N} - \underbrace{\frac{(\hat x - \mu)^2}{\sigma^2 / N}}_{\chi^2_1} \end{align*} \] such that:1 2 \[ (N-1) \frac{S^2}{\sigma^2} \sim \chi^2_N - \chi^2_1 \sim \chi^2_{N-1} \]

Distribution of normalized variable (\(t\)-Student distr.)

Design RV of the form \(\text{mean}/\sqrt{\text{variance}}\): \[ T = \frac Z{\sqrt{X/\nu}} \] where \(Z \sim \mathcal{N}(0,1)\) and \(X \sim \chi^2_\nu\) are independent.

Write out the joint probability density function \[ f_{Z, X}(z, x) = f_Z(z) f_X(x) \] then introduce a change of variables, \(Z \to T = Z / \sqrt{X/\nu}\) while keeping \(X\) intact. Compute the Jacobian. Compute \(f_{T, X}(t,x)\). Integrate out \(X\). Obtain the pdf of the \(t\)-distribution \[ f_T(t) = \frac{\Gamma(\frac{\nu+1}2)}{\sqrt{\nu\pi}\Gamma(\frac{\nu}2)} \left( 1 + \frac{t^2}\nu\right)^{-\frac{\nu+1}2} \] with \(-\infty<t<\infty\) and \(\nu > 0\).

Moments of the \(t\)-distribution \[ \begin{align*} E[T] &= 0 \\ \text{Var}[T] &= \frac \nu{\nu-2} \end{align*} \]

From sum of iid RVs to \(t\)-distribution

If we have \(N\) iid RVs \(x_i \sim \mathcal{N}(\mu, \sigma^2)\), we construct their combination1 \[ Z = \frac{\sum_{i=1}^N x_i - n\mu}{\sigma \sqrt{N}} = \frac{\hat x - \mu}{\sigma / \sqrt n} \sim \mathcal{N}(0,1) \] Remember also from the distribution of the sample variance \[ X = (N-1) \frac{S^2}{\sigma^2} \sim \chi^2_{N-1} \]

It follows that \[ T = \frac{Z}{\sqrt{X / (N-1)}} = \frac{\hat x - \mu}{S / \sqrt N} \sim t\text{-Student} (\nu=N-1) \]

If we have two datasets, we can form a variable that is approximately \(t\)-distributed: \[ \begin{align} T &= \frac{\hat x_1 - \hat x_2 - (\mu_1 - \mu_2)}{S_D} \\ S_D &= \sqrt{\frac{S_1^2}{n_1} + \frac{S_2^2}{n_2}} \end{align} \]

\(F\)-Distribution of the ratio of two variances

Given two independent \(\chi^2\) RVs, \(X\) and \(Y\), with dofs \(\nu_1, \nu_2\), then \[ F = \frac{X / \nu_1}{Y / \nu_2} \] is distributed as \[ P(F; \nu_1, \nu_2) = \frac{\Gamma[(\nu_1+\nu_2)/2]}{\Gamma(\nu_1/2)\Gamma(\nu_2/2)}\left(\frac{\nu_1}{\nu_2}\right)^{\frac{\nu_1}2} \frac{F^{\frac{\nu_1-2}2}}{(1+F\nu_1/\nu_2)^{(\nu_1 + \nu_2)/2}} \]

PDF of some statistics1

Statistic PDF
mean \(\mathcal{N}\left(\mu, \frac{\sigma}{\sqrt N}\right)\)
variance \(\chi_{N-1}^2\)
\(\text{mean}/\sqrt{\text{variance}}\) \(t\)-Student
\(\text{variance}_1 / \text{variance}_2\) \(F\)-distribution

The following slides illustrates how some of these expected distributions can be used.

Confidence regions1

How likely is it to obtain the unknown parameters (e.g., \(\mu\) and \(\sigma\)) in a given region?

\[ P(\theta_{\alpha/2} < \theta < \theta_{1-\alpha/2}) = 1-\alpha \]

Confidence regions: mean and variance

  • Mean \[ P\left(\bar x - \frac \sigma{\sqrt n} < \mu < \bar x + \frac \sigma{\sqrt n}\right) = 0.68 \]
  • Variance \[ P\left((N-1)\frac {\bar S^2}{\chi^2_{N-1}(1-\frac\alpha 2)} < \sigma^2 < (N-1)\frac {\bar S^2}{\chi^2_{N-1}(\frac\alpha 2)} \right) = 0.68 \]

Hypothesis testing

Test construction1

  • Enunciate a hypothesis \(H_0\) concerning a parameter of the distribution (e.g., the mean is zero)
  • “Testing \(H_0\):” check whether it is consistent with the data
  • \(p\)-value: probability of obtaining test results at least as extreme as the observed data, under the assumption that \(H_0\) is true.
  • Small \(p\)-value: such an extreme event would be unlikely under \(H_0\).2
  • If \(p\) is smaller than some threshold, \(\alpha\), reject the hypothesis;3 otherwise, we cannot rule it out

Example: coin toss

We flip a coin 100 times. Outcome is \(\{H: 60; T: 40\}\). Is the coin fair?

Hypothesis (\(H_0\))
The coin is fair (50/50)
Categorical variables. Outcome of a coin flip is a Binomial. Use \(\chi^2\) to compare observed from expected proportions.

Use moments of the binomial distribution \[ \begin{align} \mu &= np = 50 \\ \sigma &= \sqrt{np(1-p)} = 5 \end{align} \] and calculate the z scores for heads and tails \[ \begin{align} z &= \frac{k-\mu}\sigma = \frac{60-50}5 = 2 \\ z &= \frac{k-\mu}\sigma = \frac{40-50}5 = -2 \end{align} \]

Look up CDF of Normal distribution to find that \(p = 2\times 0.0228 = 0.0456\). Reject?

Example: coin toss

We flip a coin 100 times. Outcome is \(\{H: 60; T: 40\}\). Is the coin fair?

Hypothesis (\(H_0\))
The coin is fair (50/50)
Categorical variables. Outcome of a coin flip is a Binomial. Use \(\chi^2\) to compare observed from expected proportions.

\[ \bar \chi^2_1 = \frac{(60-50)^2}{50} + \frac{(40-50)^2}{50} = 4.0 \]

Look up \(\chi^2\) tables to find that \(P(\chi_1^2 > 4.0) = 0.046\). Reject?

Student’s \(t\)-test: Comparing normalized variables1

Compare the averages of two groups and determine if the differences between them are more likely to arise from random chance. Consider the variable \[ T = \frac{\hat X - \mu}{ \sigma / \sqrt N}, \] which follows the \(t\)-Student distribution with \(\nu=N-1\) DoFs.2

Student’s \(t\)-test: Example application

  • Alice’s class: average score = 75 pts., standard deviation = 10 pts.
  • Bob’s class: average score = 80 pts., standard deviation = 12 pts.
  • 30 students in each class
Hypothesis Testing
Null Hypothesis (\(H_0\)): The average scores in Alice’s and Bob’s classes are the same (the teachers grade equally).

Setup a two-tailed (i.e., two-sided difference) \(t\)-test to test \(H_0\) with significance level 0.05.

Student’s \(t\)-test: Example application

  • Alice’s class: \(\hat{x}_\text{Alice} = 75\), \(\hat\sigma_\text{Alice} = 10\), \(N_\text{Alice} = 30\)
  • Bob’s class: \(\hat{x}_\text{Bob} = 80\), \(\hat\sigma_\text{Bob} = 12\), \(N_\text{Bob} = 30\)

\[ T = \frac{ \hat{x}_\text{Alice} - \hat{x}_\text{Bob} }{ \sqrt{ \frac{\hat\sigma_\text{Alice}^2}{N_\text{Alice}} + \frac{\hat\sigma_\text{Bob}^2}{N_\text{Bob}}} } = -1.75 \]

Use \(t\)-Student distribution with \(\nu = 30+30 - 2 = 58\) DoFs.

Look up \(t\)-Student table with 58 DoFs and significance level 0.05 leads to \(\pm 2.0\). But \(-1.75\) is within the interval! Cannot reject the hypothesis.

F-test: Comparing variances1

Compare the variances of two independent \(\chi^2\) variables, \(X\) and \(Y\), with \(\nu_1\) and \(\nu_2\) DoFs. Then the variable \[ F = \frac{X/\nu_1}{Y/\nu_2} \] follows the \(F\)-distribution with \((\nu_1, \nu_2)\) DoFs.2

F-test: Example application

  • Effect of fertilizers on plant growth
  • Fertilizer A: 20 plants, variance in height = 9 cm^2.
  • Fertilizer B: 20 plants, variance in height = 4 cm^2.

You decide to use an F-test to determine if there’s a significant difference in the variability of plant height between the two fertilizers.

Null Hypothesis (H0)
The variances of plant heights are equal (\(σ^2_A = σ^2_B\)).
Compute \(F\) statistic; look up \(P(F; 19, 19)\) table; compare to significance level.



Testing a linear fit
Test hypothesis: the data comes from a population described by the fit parameters (5.7 in Amendola 2021)
Analysis of variance
Data from two distributions with identical variance, \(\sigma^2\). Are the means identical? (5.8 in Amendola 2021)

Non-parametric tests

Pearson \(\chi^2\) for binned data1

Consider a sample of \(N\) data \(d_i\), divided into \(k\) mutually exclusive bins with \(n_i\) data each. Suppose we know from some theory that a fraction \(p_k\) of data should go in bin \(k\).

Does the (binned) data come from a given distribution
\(H_0\): for every \(i\), prob that \(d_i\) falls in bin \(k\) is \(p_k\).

Pearson \(\chi^2\) for binned data1

Consider the case \(k=2\). \(p_1\) is simply given by the binomial \[ P(n_1; N, p) = {N \choose n_1} p_1^{n_1}(1-p_1)^{N-n_1} \] which yields the moments \(\langle n_1 \rangle = Np_1\) and \(\sigma^2 = Np_1(1-p_1)\).

Form the standardized variable \[ Y = \frac{n_1 - Np_1}{\sqrt{Np_1(1-p_1)}} \sim \mathcal{N}(0,1) \] and therefore \(Y^2 \sim \chi_1^2\). One can show that \(Y^2\) can be written as \[ Y^2 = \sum_{i=1}^2 \frac{(n_i - Np_i)^2}{Np_i} \]

Generalization to \(k>2\) \[ Y^2 = \sum_{i=1}^k \frac{(n_i - Np_i)^2}{Np_i} \sim \chi^2_{k-1} \]

Reject \(H_0\) with \(1-\alpha\) confidence level if \(\chi^2_{k-1}\) is found to lie in the upper \(\alpha\%\) of the distribution (upper 1-tail test).


Estimate the variability/uncertainty of a statistic (e.g., mean or median) by resampling our data multiple times
  • Bag with 100 marbles. Average weight? Focus on subset \(N=10\).2
  • Randomly draw 10 marbles with replacement
  • Compute statistic
  • Repeat many times (e.g., 1,000x or 10,000x)

Squeeze more information out of a limited dataset



Parameter estimation
Distribution of the sample mean, sample variance
PDF of common statistics
Expected distribution for mean, variance, standardized distribution, ratio of variances
Confidence regions
Put bounds on statistic
Hypothesis testing
Try to reject the null hypothesis up to desired confidence level
Non-parametric tests
Pearson \(\chi^2\) for binned data, bootstrap


Amendola, Luca. 2021. “Lecture Notes on Statistical Methods.”
Wackerly, Dennis, William Mendenhall, and Richard L Scheaffer. 2014. Mathematical Statistics with Applications. Cengage Learning.