Lecture 7: Linear regression
Institute for Theoretical Physics, Heidelberg University
Why?
Expected value of the output is assumed to be a linear function of the input
To estimate the weights, the NLL is equal1 to the residual sum of squares
In the following we discuss how to optimize this.
Functions that map vectors to scalars2
The solution to the normal equations
Contours of the RSS error surface. Blue cross is the MLE.
Sometimes the variance may depend on the input.2 In this case we may want to associate a specific weight to each example
The maximum likelihood estimate yields the weighted least-squares estimate
After estimating
Residuals
Predictions
Compute the residual sum of squares on the dataset (the smaller the better)
RMSE is a more common measure to quantify accuracy
More interpretable measure
Note
What happens when the data contains outliers?
Mix of
Recall multivariate normal likelihood (iid) with known variance
illustrates how the model “learns” about the parameters as it sees more data.
Note
Generalized linear model (GLM)
GLM generalizes this linearity property. GLM is a conditional version of an exponential-family distribution. The natural parameters are a linear function of the input
For GLMs,
Linear regression
where
Binomial regression
Recall the binomial distribution for a binary process with
We further have the moments
Negative log-likelihood
Consider the negative log-likelihood for the GLM