Video Player is loading.
Current Time 0:00
Duration -:-
Loaded: 0%
Stream Type LIVE
Remaining Time -:-
 
1x
  • Chapters
  • descriptions off, selected
  • subtitles off, selected
    • Quality

    Machine Learning - Performance evaluation measures

    Klipi teostus: Mirjam Paales 26.02.2013 4784 vaatamist Arvutiteadus


    Given by Sven Laur

    Brief summary: Principles of experiment design. Machine learning as minimisation of future costs. Overview of standard loss functions. Stochastic estimation of future costs by random sampling (Monte-Carlo integration). Theoretical limitations. Standard validation methods: holdout, randomised holdout, cross-validation, leave-one-out, bootstrapping. Advantages and drawbacks of standard validation methods

    Slides: PDF slides Handwritten slides

    Literature:

    Davison and Hinkley: Bootstrap Methods and Their Application
    Molinaro, Simon and Pfeiffer: Prediction Error Estimation: A Comparison of Resampling Methods
    Arlot and Celisse: A survey of cross-validation procedures for model selection
    Efron: Estimating the Error Rate of a Prediction Rule: Improvement on Cross-Validation
    Efron and Tibshirani: Improvements on Cross-Validation: The .632+ Bootstrap Method
    Wolfgang Härardle: Applied Nonparametric Regression: Choosing the smoothing parameter (Chapter 5)
    Yang: Can the Strengths of AIC and BIC Be Shared?
    van Erven, Grunwald and de Rooij:Catching Up Faster by Switching Sooner: A Prequential Solution to the AIC-BIC Dilemma

    Complementary exercises:

    Generate data form a simple linear or polynomial regression model and use various validation methods and report results:

    Did a training method chose a correct model
    Is there some differences when the correct model is not feasible?
    Estimate bias and variance of a training method
    Did a validation method correctly estimated expected losses

    Try various classification and linear regression methods together with various validation methods report the results

    Iris dataset
    Computer Hardware Data Set
    Housing Data Set
    Datasets for testing linear regression models

     

    Free implementations:

    Boot package in R
    Some methods in the rminer package in R