Regression Models Review
Unsupervised Learning
Application: Classifying Superconductors
Session | Date | Content |
---|---|---|
Day 0 | 06/16/2023 (2:30-3:30 PM) | Introduction, Setting up your Python Notebook |
Day 1 | 06/19/2023 (2:30-3:30 PM) | Python Data Types |
Day 2 | 06/20/2023 (2:30-3:30 PM) | Python Functions and Classes |
Day 3 | 06/21/2023 (2:30-3:30 PM) | Scientific Computing with Numpy and Scipy |
Day 4 | 06/22/2023 (2:30-3:30 PM) | Data Manipulation and Visualization |
Day 5 | 06/23/2023 (2:30-3:30 PM) | Materials Science Packages |
Day 6 | 06/26/2023 (2:30-3:30 PM) | Introduction to ML, Supervised Learning |
Day 7 | 06/27/2023 (2:30-3:30 PM) | Regression Models |
Day 8 | 06/28/2023 (2:30-3:30 PM) | Unsupervised Learning |
Day 9 | 06/29/2023 (2:30-3:30 PM) | Neural Networks |
Day 10 | 06/30/2023 (2:30-3:30 PM) | Advanced Applications in Materials Science |
We can re-write the linear regression model in vector form:
Often, the trends of $y$ with respect to $\mathbf{x}$ are non-linear, so multivariate linear regression may fail to give good results.
One way of handling this is by embedding the data in a higher-dimensional space using many different non-linear functions:
(The $\phi_j$ are nonlinear functions, and $D_{emb}$ is the embedding dimension)
Polynomial Regression Example:
Usually, a penalty term is added to the overall model loss function:
$$\text{ Penalty Term } = \lambda \sum_{j} w_j^2 = \lambda(\mathbf{w}^T\mathbf{w})$$
The parameter $\lambda$ is called the regularization parameter
Unsupervised Learning
Sometimes we work with high-dimensional data that is very sparse
Reducing the dimensionality of the data might be necessary
Example: The "Swiss Roll" manifold
The eigenvectors of the correlation matrix are called principal components.
The associated eigenvalues describe the proportion of the data variance in the direction of each principal component.
We can project our (normalized) data onto the first $n$ principal components to reduce the dimensionality of the data, while still keeping most of the variance:
$$\mathbf{z} \mapsto \mathbf{u} = \begin{bmatrix} \mathbf{z}^T\mathbf{p}^{(1)} \\ \mathbf{z}^T\mathbf{p}^{(2)} \\ \vdots \\ \mathbf{z}^T\mathbf{p}^{(n)} \\ \end{bmatrix}$$Clustering methods allow us to identify dense groupings of data.
Distribution Estimation allows us to estimate the probability distribution of the data.
$k$-means is a popular clustering algorithm that identifies the centerpoints a specified number of clusters $k$
These center points are called centroids
Kernel Density Estimation (KDE) estimates the probability distribution of an entire dataset
Estimates the distribution as a sum of multivariate normal "bumps" at the position of each datapoint
A Gaussian Mixture Model (GMM) performs both clustering and distribution estimation simultaneously.
Works by fitting a mixture of multivariate normal distributions to the data
(Note: some sections are still in progress ☹️)
If possible, try to do the exercises. Bring your questions to our next meeting tomorrow.