Computational Statistics and Data Analysis (MVComp2)

Winter Semester 2023/2024. October 16th, 2023 to February 10, 2024

Lectures: Tue 11-13; Exercises: Fri 14-16

Lectures: Philosophenweg 12 / kHS; Exercises: INF 308 / HS2

Lecturer: Prof. Dr. Tristan Bereau, Institute for Theoretical Physics, Heidelberg University

6 credit points

Course description

This lecture will introduce basic methods and approaches in computational statistics and data analysis, of great importance to empirical problems in the natural sciences. An overview of relevant concepts and theorems in probability theorey and statistics will be covered, all the way to more modern approaches, including automatic differentiation and machine learning. Lectures will be accompanied by computational exercises in Python. Students will learn to analyze data sets and interpret the results from a solid, thoeretically grounded statistical perspective; devise statistical and machine learning models of experimental situations; infer the parameters of these models from empirical observations; and test hypotheses.

Prerequisites

  • Linear (Matrix) Algebra
  • Basic calculus (derivatives & integrals)
  • Basic programming skills in Python

Tentative course outline

  1. Basic concepts in probability theory
  2. Random variables; expectations, variances, covariances, and their properties
  3. Discrete & continuous probability distributions
  4. Moment-generating functions, central limit theorem, and multivariate distributions
  5. Statistical models & inference: parameter estimation
  6. Hypothesis tests: tests, confidence intervals, bootstrap method
  7. Linear regression: least squares, generalized linear model
  8. Regularization: Ridge & LASSO regression, MAP estimation
  9. Nonlinear regression: basis expansions, neural networks
  10. Classification: k-nearest neighbors, logistic regression, linear discriminant analysis
  11. Kernel methods: Mercer kernels, Gaussian processes, support vector machines
  12. Model selection: Jeffreys scale, BIC, bias-variance tradeoff
  13. Dimensionality reduction: principal component analysis, factor analysis
  14. Information theory

Main references

Attendance

Lectures
Lectures will cover the conceptual and theoretical aspects relevant to the course. Slides used in the lectures will be uploaded after each lecture on this page.
Exercises
These sessions will provide opportunity to discuss the last lecture, previous exercises, and work on hands-on problems.

Assessment

Competence and proficiency will be assessed through:

Exercises

  • To be handed in on a weekly basis.
  • You are requested to submit by groups of two or three. Please ensure that your submitted document clearly contains the names of everyone in your group. Your group is responsible for producing one original exercise submission.
  • Exercises will typically include one coding problem. Though I recommend Python, you are welcome to use any language you see fit. The code you write will not be graded—only the results will be (e.g., calculations or plots).
  • Preferred formats of your exercise: LaTeX, Markdown, Jupyter notebook, Quarto, or equivalent. Please always submit one PDF file.
  • Exercise is always due the day before the next Exercice session, i.e., on Thursdays by 23:59. Exercices are to be uploaded on Physik Übungsgruppen. No extensions will be granted.
  • Exercises and solutions available on this page.

Exam

One on-site, written exam at the end of the semester.

Date: Friday Feb. 9, 2024, 14:00-16:00.

Place: INF 308 / HS 2

Weights

  • Exam: 70%
  • Exercises: 30%

Rocket.Chat

We will be using Rocket.Chat as a virtual and public forum for questions discussion. The system allows for fast responses from the instructor and classmates, and permits students to see previous questions and answers. Rather than emailing questions to the instructor, please post your questions on Rocket.Chat. If you have not already been automatically enrolled, please sign up via https://uebungen.physik.uni-heidelberg.de/chat/group/WS23-1758.

On large language models

Section written by OpenAI’s ChatGPT.

Supplement, Not Substitute
While LLMs can be great supplementary tools, they should not replace your primary learning resources, such as textbooks, lectures, and discussions. Always prioritize understanding the fundamental concepts from your course materials.
Exercises & Assignments
We encourage students to use LLMs for understanding topics, brainstorming ideas, or verifying concepts. However, simply copying answers or relying on the LLM to solve your exercises defeats the purpose of your academic journey. It’s vital to ensure you understand the material and can apply it without external assistance.
Ethical Considerations
Remember, using external sources without proper citation or presenting another’s work as your own is plagiarism. Always cite any assistance or information obtained from LLMs.
Limitations
While LLMs are advanced and can provide a wealth of information, they are not infallible. Always cross-check crucial information from multiple trusted sources.