Here are lecture slides from a 2-day short course on Statistical Machine Learning.
- Day 1 Slides–Session-1-Stat Learning
- Day 2 Slides–Session-2-Stat Learning
Here is a set of slides to accompany the Statistical Learning Notes.
Chapter 1-Introduction, Generalities, and Some Background Material
Section 1-Overview/Context
101-Introduction–Notation and Terminology, What is New Here, Representing What is Known
102-Optimal Predictors–Optimal (Unrealizable/Theoretical) Predictors
103-Nearest Neighbors–Nearest Neighbor Rules
104-Error Decompositions–General and SEL Decompositions of Expected Prediction Loss
105-Cross-Validation–Cross-Validation
106-Predictor Choice and CV–Choice of Predictor Complexity and Cross-Validation
107-Penalization and Complexity–Penalized Training Error Fitting and Predictor Complexity
108-Optimal Features for Classification–Classification Models and Optimal Features
109-Quantitative Features for Classification–Quantitative Representation of Qualitative Features for Classification
117-Document Features–Document Features
110-Functions as Features and Kernels–Abstract Feature Spaces (of Functions) and “Kernels”
111-Kernel Mechanics–Making Kernels
112-Feature Engineering (etc.) Perspective–Feature Selection/Engineering and Data “Pre-processing: Some perspective and Prediction of Predictor Efficacy
113-More Optimal 2 Class Classification–More on the Form of an Optimal 0-1 Loss Classifier for K=2
114-Other 2 Class Losses–Other Prediction Problems in 2-Class Classification Models
115-Voting Functions for 2 Class Classification–Voting Functions, Losses for them, and Expected 0-1 Loss
116-Density Estimation and Classification–Density Estimation and Approximately Optimal and Naive Bayes Classification
117-Document Features–Document Features
Section 2-Some Linear Theory, Linear Algebra, and Principal Components
201-Inner Product Spaces–Inner Product Spaces
202-Gram Schmidt and QR–The (General) Gram-Schmidt Process and the QR Decomposition of a rank=p Matrix X
203-SVD of X–The Singular Value Decomposition of X
204-SVD and Inner Product Spaces–The Singular Value Decomposition and General Inner Product Spaces
205-Ordinary PCs–“Ordinary” Principal Components
206-Kernel PCs–“Kernel” Principal Components
207-Graphical Spectral Features–“Graphical Spectral” Features
Chapter 2-Supervised Learning 1: Basic Prediction Methodology
Section 3-(Non-OLS) SEL Linear Predictors
301-Ridge Regression–Ridge Regression
302-LASSO etc–The Lasso, Etc.
303-PCR–Principal Components Regression
304-PLS–Partial Least Squares
Section 4-SEL Linear Predictors Using Basis Functions
401-p=1 Wavelet Bases–p=1 Wavelet Bases
402-p=1 Regression Splines–p=1 Piecewise Polynomials and Regression Splines
403-Tensor Product Bases and Prediction–Basis Functions and p-Dimensional Inputs (Tensor Product Bases and MARS)
Section 5-Smoothing Splines and SEL Prediction
501-p=1 Smoothing Splines–p=1 Smoothing Splines
502-Multi-Dimensional Smoothing Splines–Multi-Dimensional Smoothing Splines
503-Penalized Fitting in N-space–An Abstraction of Smoothing Splines and Penalized Fitting to N Responses
504-Graph-Based Penalized Smoothing and Semi-supervised Learning–Graph-Based Penalized Fitting/Smoothing (and Semi-Supervised Learning)
Section 6-Kernel and Local Regression Smoothing Methods and SEL Prediction
601-1D Kernel and Local Regression Smoothers–One-Dimensional Kernel and Local Regression Smoothers
602-Local Regression Smoothing in p Dimensions–Local Regression Smoothing in p Dimensions
Section 7-High-Dimensional Use of Low-Dimensional Smoothers and SEL Prediction
701-Structured Regression Functions–Additive Models and Other Structured Regression Functions
702-Projection Pursuit Regression–Projection Pursuit Regression
Section 8-Highly Non-Linear Parametric Regression Methods
801-Neural Network Regression–Neural Network Regression
802-Neural Network Classification–Neural Network Classification
803-Neural Network Fitting–The Back-Propagation Algorithm
804-Regularization of Neural Network Fitting–Formal Regularization of Neural Network Fitting
805-Convolutional Neural Networks–Convolutional Neural Networks
806-Recurrent Neural Networks–Recurrent Neural Networks
807-Radial Basis Function Networks–Radial Basis Function Networks
Section 9-Prediction Methods Based on Rectangles: Trees and PRIM
901-CART and PRIM–Prediction Based on Rectangles
902-Regression Trees–Regression Trees
903-Classification Trees–Classification Trees
904-Optimal Subtrees–Optimal Subtrees
905-Variable Importance for Tree Predictors–Measuring the Importance of Inputs for Trees
906-PRIM–PRIM
Section 10-Predictors Built on Bootstrap Samples
1001-Bagging Generalities–Bagging in General
1002-Random Forests–Random Forests: Special Bagging of Tree Predictors
1003-Measuring the Importance of Inputs for Bagged Predictors–Measuring the Importance of Inputs for Bagged Predictors
1004-Boruta–The Boruta Wrapper/Heuristic for Input Variable Selection
1005-Bumping and Active Set Selection–Bumping and “Active Set Selection”
Section 11-“Ensembles” of Predictors
1101-Bayes Model Averaging–Bayesian Model Averaging for Prediction
1102-Stacking for SEL and 0-1 Loss–Stacking: SEL … and 0-1 Loss
1103-Generalized Stacking and Deep Structures–“Generalized Stacking” and “Deep” Structures for Prediction
1104-Boosting-Successive Approximation in SML–Boosting: Successive Approximation in Prediction
1105-Boosting-AdaBoost.M1–AdaBoost.M1
1106-Qunilan’s Cubist and Divide and Conquer Strategies–Qinlan’s Cubist
Chapter 3-Supervised Learning II: More on Classification (Mostly Linear Methods)
Section 12-Basic Linear Methods
1201-Linear and Quadratic Discriminant Analysis–Linear (and a Bit on Quadratic) Discriminant Analysis
1202-Dimension Reduction in LDA–Dimension Reduction in LDA
1203-Logistic Regression and Classification–Logistic Regression
Section 13-Support Vector Machines
1301-SVMs 1 Maximum Margin Classifiers–The Linearly Separable Case: Maximum Margin Classifiers
1302-SVMs 2 Support Vector Classifiers–The Linearly Non-Separable Case: Support Vector Classifiers
1303-SVMs 3A Support Vector Machines Heuristics–SV Classifiers and Kernels: Heuristics
1304-SVMs 3B Support Vector Machines Penalized Fitting–SV Classifiers and Kernels: A Penalized Fitting Function Space Argument
1305-SVMs 3C Support Vector Machines Geometry–SV Classifiers and Kernels: A Function Space Geometry Argument
1306-SVMs 3D Support Vector Machines Perspective–SVMs : Some Perspective
1307-SVMs 4 Support Vector Machines Other Related Issues–Other SV Stuff
Section 14-Prototype and (More on) Nearest Neighbor Methods of Classification
1401-Prototype and Nearest Neighbor Classification–Prototype and Nearest Neighbor Methods
Chapter 4-More Theory Regarding Supervised Learning
Section 15-Reproducing Kernel Hilbert Spaces: Penalized/Regularized Fitting and Bayes Prediction
1501-RKHSs and Smoothing Splines–RKHSs and p=1 Cubic Smoothing Splines
1502-Development of Kernels from Linear Functionals and Differential Operators–What is Possible Beginning from Linear Functionals and Linear Operators for p=1
1503-Prediction Theory Beginning from a Kernel–What is Common Beginning Directly from a Kernel
1504-Gaussian Spatial Processes Kernels and Predictors–Gaussian Process “Priors,” Bayes Predictors, and RKHs
Chapter 5-Unsupervised Learning Methods
Section 17-Some Methods of Unsupervised Learning
1701-Association Rules-Market Basket Analysis–Association Rules/Market Basket Analysis
1702-The Apriori Algorithm and its Uses–The “Apriori” Algorithm
1704-Clustering Generalities–Clustering
1705-Partitioning Methods of Clustering–Clustering: Partitioning Methods (“Centroid”-Based Methods)
1706-Hierarchical Clustering Methods–Clustering: Hierarchical Methods
1707-Model-Based Clustering–Clustering: (Mixture) Model-Based Methods
1708-Biclustering–Clustering: Biclustering
1709-Self-Organizing Maps–Clustering: Self-Organizing Maps
1710-Multi-Dimensional Scaling–Multi-dimensional Scaling
1711-More Principal Components Ideas–Sparse Principal Components, Non-Negative Matrix Factorization, Archetypal Analysis, Independent Component Analysis
–Principal Curves and Surfaces
1712-Original PageRanks–Original Google™ PageRanks
Chapter 6-Miscellanea
The materials on this site may be used without charge for non-commercial personal educational purposes. Any copies made of the materials must bear printed acknowledgment of their Analytics Iowa LLC source.