Materials+ML Workshop Day 4¶

logo

Day 4 Agenda:¶

  • Questions about Day 3 Material
  • Review of Day 3

Content for today:

  • Data Manipulation:
    • The Pandas Package
    • Working with DataFrames
  • Visualizing Data
    • The Matplotlib package
    • Visualizing 1D data
    • Visualizing 2D and 3D data

Tentative Workshop Schedule:¶

Session Date Content
Day 0 06/16/2023 (2:30-3:30 PM) Introduction, Setting up your Python Notebook
Day 1 06/19/2023 (2:30-3:30 PM) Python Data Types
Day 2 06/20/2023 (2:30-3:30 PM) Python Functions and Classes
Day 3 06/21/2023 (2:30-3:30 PM) Scientific Computing with Numpy and Scipy
Day 4 06/22/2023 (2:30-3:30 PM) Data Manipulation and Visualization
Day 5 06/23/2023 (2:30-3:30 PM) Materials Science Packages
Day 6 06/26/2023 (2:30-3:30 PM) Introduction to ML, Supervised Learning
Day 7 06/27/2023 (2:30-3:30 PM) Regression Models
Day 8 06/28/2023 (2:30-3:30 PM) Unsupervised Learning
Day 9 06/29/2023 (2:30-3:30 PM) Neural Networks
Day 10 06/30/2023 (2:30-3:30 PM) Advanced Applications in Materials Science

Questions¶

Material covered yesterday:

  • Installing Python packages
  • Numpy
  • Scipy

Review: Numpy¶

  • Numpy supplies mathematical functions (such as sin(x), exp(x), etc.)
  • Numpy arrays are multi-dimensional data structures
  • Numpy arrays can represent vectors, matrices, tensors, etc.
  • Creating Numpy arrays:
In [1]:
import numpy as np

# create a 1D array:
x = np.array([1.0, 2.0, 3.0, 4.0])
print(x)

# create a 2D array (matrix):
X = np.array([
    [1,2,3],
    [4,5,6],
    [7,8,9]
])
print(X)
[1. 2. 3. 4.]
[[1 2 3]
 [4 5 6]
 [7 8 9]]
  • Every array has an instance variable shape
  • The length of the tuple is the dimension of the array
  • The entries in the tuple represent the size of the array along each axis (i.e. dimension)
In [2]:
# x is a 1D array of length 4:
print(x.shape)

# X is a 3x3 matrix:
print(X.shape)

# create an array of zeros with a 3x2x2 shape:
S = np.zeros((3,2,2))
print(S.shape)
(4,)
(3, 3)
(3, 2, 2)
  • Numpy arrays can be indexed like Python lists, but with some added features:
In [3]:
X = np.array(range(1,10)).reshape((3,3))
print(X)

# access row 0:
print('Accessing X[0]:')
print(X[0])

# access row 0, column 2:
print('Accessing X[0,2]:')
print(X[0,2])

# access column 0:
print('Accessing X[:,0]:')
print(X[:,0])
[[1 2 3]
 [4 5 6]
 [7 8 9]]
Accessing X[0]:
[1 2 3]
Accessing X[0,2]:
3
Accessing X[:,0]:
[1 4 7]
  • All math operations on arrays are performed elementwise
  • Numpy support matrix multiplications with the @ operator
In [4]:
A = np.array(range(1,5)).reshape(2,2)
D = np.diag([1,2])

print('A:\n', A)
print('D:\n', D)

# elementwise addition:
print(A + D)

# matrix multiplication:
print(A @ D)
A:
 [[1 2]
 [3 4]]
D:
 [[1 0]
 [0 2]]
[[2 2]
 [3 6]]
[[1 4]
 [3 8]]
  • One important numpy function we will use a lot today is np.linspace:
In [5]:
start = 0.0
end = 10.0
n_pts = 11

# create a 1D array of uniform points:
x_pts = np.linspace(start, end, n_pts)
print(x_pts)
[ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9. 10.]

The Scipy Package¶

  • Scipy provides many useful subpackages for scientific computing
  • Subpackages you may find useful include:
    • scipy.constants: physical constants, unit conversions
    • scipy.optimize: functions for optimization and root finding
    • scipy.integrate: functions numerical integration
    • scipy.stats: statistical analysis functions
    • scipy.special: special functions (e.g. Bessel functions)

New Content:¶

  • More Python packages:
    • Pandas ("Panel Datasets")
    • Matplotlib ("MATLAB-like plotting library")

Installing Pandas and Matplotlib:¶

In [6]:
pip install pandas
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: pandas in /home/colin/.local/lib/python3.10/site-packages (1.4.4)
Requirement already satisfied: python-dateutil>=2.8.1 in /home/colin/.local/lib/python3.10/site-packages (from pandas) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /usr/lib/python3/dist-packages (from pandas) (2022.1)
Requirement already satisfied: numpy>=1.21.0 in /usr/lib/python3/dist-packages (from pandas) (1.21.5)
Requirement already satisfied: six>=1.5 in /usr/lib/python3/dist-packages (from python-dateutil>=2.8.1->pandas) (1.16.0)
Note: you may need to restart the kernel to use updated packages.
In [7]:
pip install matplotlib
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: matplotlib in /usr/lib/python3/dist-packages (3.5.1)
Note: you may need to restart the kernel to use updated packages.

Pandas¶

  • Pandas is an open-source Python package for data manipulation and analysis.
  • It can be used for reading writing data to several different formats including:
    • CSV (comma-separated values)
    • Excel spreadsheets
    • SQL databases
  • We can import pandas as follows:
In [8]:
import pandas as pd

DataFrames¶

  • We can create Dataframes from Python dictionaries as follows:
In [9]:
# Data on the first four elements of the periodic table:
elements_data = {
    'Element' : ['H', 'He', 'Li', 'Be'],
    'Atomic Number' : [ 1, 2, 3, 4 ],
    'Mass' : [ 1.008, 4.002, 6.940, 9.012],
    'Electronegativity' : [ 2.20, 0.0, 0.98, 1.57 ]
}

# construct dataframe from data dictionary:
df = pd.DataFrame(elements_data)

(You can copy this code from the online book's Data Handling section)

Tutorial: Working with Dataframes¶

  • Accessing Dataframe columns
  • Filtering Dataframes
  • Transforming Data
  • Importing and exporting data

Matplotlib¶

  • Matplotlib is a MATLAB-like plotting utility for creating publication-quality plots
  • In matplotlib, we typically import the pyplot subpackage with the alias plt:
In [10]:
import matplotlib.pyplot as plt

Tutorial: Basic Plots with Matplotlib¶

  • Plotting 1D data
  • Styling plots
  • Adding axes labels, titles, legends
  • Typesetting

Tutorial: 2D and 3D plots in Matplotlib¶

  • Colormapping
  • Plotting in 3D
  • Saving figures

Recommended Reading:¶

  • Materials Science Python Packages

If possible, try to do the exercises. Bring your questions to our next meeting on Monday.