Supervised Learning Review
Regression Models
Application: Predicting Material Bandgaps
Session | Date | Content |
---|---|---|
Day 0 | 06/16/2023 (2:30-3:30 PM) | Introduction, Setting up your Python Notebook |
Day 1 | 06/19/2023 (2:30-3:30 PM) | Python Data Types |
Day 2 | 06/20/2023 (2:30-3:30 PM) | Python Functions and Classes |
Day 3 | 06/21/2023 (2:30-3:30 PM) | Scientific Computing with Numpy and Scipy |
Day 4 | 06/22/2023 (2:30-3:30 PM) | Data Manipulation and Visualization |
Day 5 | 06/23/2023 (2:30-3:30 PM) | Materials Science Packages |
Day 6 | 06/26/2023 (2:30-3:30 PM) | Introduction to ML, Supervised Learning |
Day 7 | 06/27/2023 (2:30-3:30 PM) | Regression Models |
Day 8 | 06/28/2023 (2:30-3:30 PM) | Unsupervised Learning |
Day 9 | 06/29/2023 (2:30-3:30 PM) | Neural Networks |
Day 10 | 06/30/2023 (2:30-3:30 PM) | Advanced Applications in Materials Science |
Machine Learning Problems can be divided into three general categories:
Learn a model that makes accurate predictions $\hat{y}$ of $y$ based on a vector of features $\mathbf{x}$.
We can think of a model as a function $f : \mathcal{X} \rightarrow \mathcal{Y}$
The validation set is used for comparing the accuracy of different models or instances of the same model with different parameters.
The test set is used to provide a final, unbiased estimate of the best model selected using the validation set.
Examples:
Mean Square Error (MSE):
$$\mathcal{E}(f) = \frac{1}{N} \sum_{n=1}^N (f(\mathbf{x}_n) - y_n)^2$$
Mean Absolute Error (MAE):
$$\mathcal{E}(f) = \frac{1}{N} \sum_{n=1}^N |f(\mathbf{x}_n) - y_n|$$
Classification Accuracy:
$$\mathcal{E}(f) = \frac{1}{N} \sum_{n=1}^N \delta(\hat{y} - y) = \left[ \frac{\text{# Correct}}{\text{Total}} \right]$$
Advanced Regression Models
We can re-write the linear regression model in vector form:
Often, the trends of $y$ with respect to $\mathbf{x}$ are non-linear, so multivariate linear regression may fail to give good results.
One way of handling this is by embedding the data in a higher-dimensional space using many different non-linear functions:
(The $\phi_j$ are nonlinear functions, and $D_{emb}$ is the embedding dimension)
High-dimensional embeddings are powerful because they give a model enough degrees of freedom to conform to non-linearities in the data.
The more degrees of freedom a model has the more prone it is to "memorizing" the data instead of "learning from it".
Polynomial Regression Example:
One way of reducing overfitting is by gathering more data.
Another way to reduce overfitting is to apply regularization
Regularization refers to the use of some mechanism that deliberately reduces the flexibility of a model in order to reduce the validation set error
A common form of regularization is penalizing the model for having large weights.
For most models, a penalty term is added to the overall model loss function.
$$\text{ Penalty Term } = \lambda \sum_{j} w_j^2 = \lambda(\mathbf{w}^T\mathbf{w})$$
The parameter $\lambda$ is called the regularization parameter
Instead of enbedding data directly, kernel machines compute only the inner products of pairs of data points in the embedding space.
This inner product is computed by a _kernel function $K(\mathbf{x}, \mathbf{x}')$.
Kernel machines even allow us to perform linear regression in infinite dimensional spaces!
If possible, try to do the exercises. Bring your questions to our next meeting tomorrow.