| Session | Date | Content |
|---|---|---|
| Day 0 | 06/16/2023 (2:30-3:30 PM) | Introduction, Setting up your Python Notebook |
| Day 1 | 06/19/2023 (2:30-3:30 PM) | Python Data Types |
| Day 2 | 06/20/2023 (2:30-3:30 PM) | Python Functions and Classes |
| Day 3 | 06/21/2023 (2:30-3:30 PM) | Scientific Computing with Numpy and Scipy |
| Day 4 | 06/22/2023 (2:30-3:30 PM) | Data Manipulation and Visualization |
| Day 5 | 06/23/2023 (2:30-3:30 PM) | Materials Science Packages |
| Day 6 | 06/26/2023 (2:30-3:30 PM) | Introduction to ML, Supervised Learning |
| Day 7 | 06/27/2023 (2:30-3:30 PM) | Regression Models |
| Day 8 | 06/28/2023 (2:30-3:30 PM) | Unsupervised Learning |
| Day 9 | 06/29/2023 (2:30-3:30 PM) | Neural Networks |
| Day 10 | 06/30/2023 (2:30-3:30 PM) | Advanced Applications in Materials Science |
What is Machine Learning?
Machine Learning (ML) is a subfield of AI (Artificial Intelligence), that is concerned with:



Machine Learning Problems can be divided into three general categories:
When can supervised learning be applied?
Problems where the available data contains many different labeled examples
Problems that involve finding a model that maps a set of features (inputs) to labels (outputs).
A supervised learning dataset consists of $(\mathbf{x}, y)$ pairs:
$y$ values can be continuous scalars, vectors, or discrete classes.
Here, we will assume $\mathbf{x}$ is a real vector and $y$ is a continuous real scalar (unless otherwise specficied).
What is the goal of supervised Learning?
The goal is to learn a model that makes accurate predictions (denoted $\hat{y}$) of $y$ based on a vector of features $\mathbf{x}$.
We can think of a model as a function $f : \mathcal{X} \rightarrow \mathcal{Y}$
The type of a supervised learning problem depends on the type of value $y$ we are attempting to predict:
If $y$ is a continuous value, it is a regression problem
If $y$ can be a finite number of values, it is a classification problem
If $y$ is a continuous probability (between $0$ and $1$), it is a logistic regression problem*
*In some textbooks, logistic regression also refers to a specific kind of model that is used for predicting probabilities.
Model validity is a subjective property, because we may not know what the correct label $y$ is for every single value $\mathbf{x}$ in $\mathcal{X}$.
Often, we only know the $(\mathbf{x},y)$ pairs in our dataset.
If there is noise or bias in our data, even those $(\mathbf{x},y)$ pairs may be unreliable.

Which model is the more valid model?

Here's how we can solve the problem of estimating model validity:
Purposely leave out a random subset of the data that the model is fit to.
This subset that we leave out is called the validation set.
The subset we fit the model to is called the training set.
The validation set is used for comparing the accuracy of different models or instances of the same model with different parameters.
The test set is used to provide a final, unbiased estimate of the best model selected using the validation set.
Evaluating the final model accuracy on the test set eliminates selection bias associated with the accuracies on the validation set.
The more models that are compared using the validation set, the greater the need for the test set.
This is especially true if you are reporting the statistical significance of your model's accuracy being better than another model.
For each feature vector $\mathbf{x}$, some features vary much more than other features.
To avoid making our model more sensitive to features with high variance, we normalize each feature, so that it lies roughly on the interval $[-2,2]$.
To evaluate the accuracy of a model on a dataset, we use a loss function.
A loss function is function of a prediction $\hat{y}$ and a true label $y$ that increases as the prediction deviates from the true label.
Examples:
Mean Square Error (MSE):
$$\mathcal{E}(f) = \frac{1}{N} \sum_{n=1}^N (f(\mathbf{x}_n) - y_n)^2$$
Mean Absolute Error (MAE):
$$\mathcal{E}(f) = \frac{1}{N} \sum_{n=1}^N |f(\mathbf{x}_n) - y_n|$$
Classification Accuracy:
$$\mathcal{E}(f) = \frac{1}{N} \sum_{n=1}^N \delta(\hat{y} - y) = \left[ \frac{\text{# Correct}}{\text{Total}} \right]$$
Most models have weights that must be adjusted to fit the training dataset:
Example (1D polynomial regression):
$$f(x) = \sum_{d=0}^{D} w_dx^d$$
There are many different methods that can be used to find the optimial weights $w_i$.
The most common method for fitting the data is through gradient descent.
Some models (such as linear regression) have optimal weights that can be solved for in closed form.
The gradient of a function $g: \mathbb{R}^n \rightarrow \mathbb{R}$ is the vector-valued function:
$$\nabla g(\mathbf{w}) = \begin{bmatrix} \frac{\partial g}{\partial w_0}(\mathbf{w}) & \frac{\partial g}{\partial w_1}(\mathbf{w}) & \dots & \frac{\partial g}{\partial w_n}(\mathbf{w}) \end{bmatrix}^T$$If possible, try to do the exercises. Bring your questions to our next meeting (next Monday).