Given by Sven Laur
Brief summary: Principles of experiment design. Machine learning as minimisation of future costs. Overview of standard loss functions. Stochastic estimation of future costs by random sampling (Monte-Carlo integration). Theoretical limitations. Standard validation methods: holdout, randomised holdout, cross-validation, leave-one-out, bootstrapping. Advantages and drawbacks of standard validation methods
Slides: PDF slides Handwritten slides
Literature:
Davison and Hinkley: Bootstrap Methods and Their Application
Molinaro, Simon and Pfeiffer: Prediction Error Estimation: A Comparison of Resampling Methods
Arlot and Celisse: A survey of cross-validation procedures for model selection
Efron: Estimating the Error Rate of a Prediction Rule: Improvement on Cross-Validation
Efron and Tibshirani: Improvements on Cross-Validation: The .632+ Bootstrap Method
Wolfgang Härardle: Applied Nonparametric Regression: Choosing the smoothing parameter (Chapter 5)
Yang: Can the Strengths of AIC and BIC Be Shared?
van Erven, Grunwald and de Rooij:Catching Up Faster by Switching Sooner: A Prequential Solution to the AIC-BIC Dilemma
Complementary exercises:
Generate data form a simple linear or polynomial regression model and use various validation methods and report results:
Did a training method chose a correct model
Is there some differences when the correct model is not feasible?
Estimate bias and variance of a training method
Did a validation method correctly estimated expected losses
Try various classification and linear regression methods together with various validation methods report the results
Iris dataset
Computer Hardware Data Set
Housing Data Set
Datasets for testing linear regression models
Free implementations: