Materials+ML Workshop Day 3¶

logo

Day 3 Agenda:¶

  • Questions about Day 2 Material
  • Review of Day 2

Content for today:

  • Scientific Python Packages
    • Installing packages
  • The NumPy package:
    • Numpy Arrays
    • Array Operations
    • Matrix and Vector Operations
  • The SciPy package:
    • Scientific Constants
    • Integration
    • Optimization

Background Survey¶

background survey

https://forms.gle/FEWqPavJJYC9VzfH7¶

The Workshop Online Book:¶

https://cburdine.github.io/materials-ml-workshop/¶

Tentative Workshop Schedule:¶

Session Date Content
Day 0 06/16/2023 (2:30-3:30 PM) Introduction, Setting up your Python Notebook
Day 1 06/19/2023 (2:30-3:30 PM) Python Data Types
Day 2 06/20/2023 (2:30-3:30 PM) Python Functions and Classes
Day 3 06/21/2023 (2:30-3:30 PM) Scientific Computing with Numpy and Scipy
Day 4 06/22/2023 (2:30-3:30 PM) Data Manipulation and Visualization
Day 5 06/23/2023 (2:30-3:30 PM) Materials Science Packages
Day 6 06/26/2023 (2:30-3:30 PM) Introduction to ML, Supervised Learning
Day 7 06/27/2023 (2:30-3:30 PM) Regression Models
Day 8 06/28/2023 (2:30-3:30 PM) Unsupervised Learning
Day 9 06/29/2023 (2:30-3:30 PM) Neural Networks
Day 10 06/30/2023 (2:30-3:30 PM) Advanced Applications in Materials Science

Questions¶

Material covered yesterday:

  • Python Functions
  • Python Classes

Review: Functions¶

  • Python functions are reusable blocks of code that we can execute when it is called.

  • Similar to mathematical functions, Python functions can have inputs, outputs, or even modify variables

In [1]:
# create a function to add two numbers:
def add_numbers(a,b):
    total = a + b
    return total # <-- output of function

# call `add_numbers` and store the output:
result = add_numbers(3,5)

print(result)
8
  • Functions can also have default values:
In [2]:
def greet(name, message="Hello"):
    """ Prints a greeting with a name and a message """
    print(message + ', ' + name + '!')

# call greet with the default message:
greet('Albert')

# call greet with a non-default message:
greet('Albert', 'Greetings')
Hello, Albert!
Greetings, Albert!

Python Classes¶

  • Python classes serve as a kind of "blueprint" for a data type
  • Instances of classes we create are called objects
In [10]:
class Dog:
    """ This class represents a pet dog """
    
    def __init__(self, dog_name):
        """ Constructs a Dog instance with given name """
        self.name = dog_name
    
    def bark(self):
        """ Causes this dog to bark """
        print(self.name + ' says: "Woof!"')
        

Methods¶

  • Methods are functions defined within the class body
  • They have self as the first parameter, which allows for the method to be called using object.function(...) syntax
  • Classes have a special __init__ method called the constructor:
    • It is used for creating instances of a class
    • It is called by using the name of the class
In [11]:
# construct a Dog object:
my_dog = Dog('Snoopy')

# change the `name` instance variable:
my_dog.name = 'Fido'

# call the bark() method:
my_dog.bark()
Fido says: "Woof!"

New Content:¶

  • Installing and Importing Python Packages
  • Scientific Python packages
    • NumPy ("Numerical Python")
    • SciPy ("Scientific Python")

Checking if Packages are installed¶

  • The quickest way to check if a package is installed on your system is to import it:
In [17]:
import numpy # import the NumPy base package
In [18]:
import scipy # import the Scipy base package
  • If either of these import statements results in an error, you will need to install the corresponding package.

Installing Python Packages¶

If a package is not installed on your system, you can install it using pip (the package installer for Python) on your system:

In [15]:
pip install numpy
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: numpy in /usr/lib/python3/dist-packages (1.21.5)
Note: you may need to restart the kernel to use updated packages.
In [16]:
pip install scipy
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: scipy in /usr/lib/python3/dist-packages (1.8.0)
Note: you may need to restart the kernel to use updated packages.
  • On some systems, you may need to invoke your command shell:
In [13]:
!pip install numpy scipy
Requirement already satisfied: numpy in /media/colin/Shared/colin/git/materials-ml-workshop-notebooks/env/lib/python3.10/site-packages (1.24.3)
Requirement already satisfied: scipy in /media/colin/Shared/colin/git/materials-ml-workshop-notebooks/env/lib/python3.10/site-packages (1.10.1)

Working with Packages¶

  • When working with a package for the first time, it is helpful to read the package's online documentation
    • Numpy package: https://numpy.org/doc/stable/
    • Scipy package: https://docs.scipy.org/doc/scipy/

numpy documentation

The NumPy Package¶

  • Numpy is a numerical computing package in Python
  • Numpy provides an interface to several mathematical functions
  • Numpy supports multi-dimensional arrays for fast numerical computing
    • These arrays are instances of the numpy.ndarray class
    • ndarrays can be used to represent vectors, matrices, tensors, etc.
  • Common practice is to import numpy with the alias np at the beginning of a program:
In [19]:
import numpy as np

Mathematical functions:¶

  • Numpy provides some basic mathematical functions and constants:
In [27]:
# square root function:
print(np.sqrt(4))

# natural log and exponential functions:
print(np.log(10))
print(np.exp(-20))

# trigonometric functions (and pi constant)
print(np.sin(np.pi / 2))
2.0
2.302585092994046
2.061153622438558e-09
1.0

Numpy arrays:¶

  • Numpy arrays store an rectangular array of values of the same type (usually float, int, or complex)

  • We create them from nested Python lists as follows:

In [40]:
# create a 1D array:
x = np.array([1.0, 2.0, 3.0, 4.0])
print(x)

# create a 2D array (matrix):
X = np.array([
    [1,2,3],
    [4,5,6],
    [7,8,9]
])
print(X)
[1. 2. 3. 4.]
[[1 2 3]
 [4 5 6]
 [7 8 9]]
  • Every array has an instance variable shape
  • The length of the tuple is the dimension of the array
  • The entries in the tuple represent the size of the array along each axis (i.e. dimension)
In [42]:
# x is a 1D array of length 4:
print(x.shape)

# X is a 3x3 matrix:
print(X.shape)

# create an array of zeros with a 3x2x2 shape:
S = np.zeros((3,2,2))
print(S.shape)
(4,)
(3, 3)
(3, 2, 2)

Reshaping Arrays:¶

In [45]:
# print original shape of X:
print('X shape:', X.shape)

# reshape X to a 1D array:
X2 = X.reshape((9,))
print('\nX reshaped to (9,):')
print(X2)
print(X2.shape)

# reshape X to a 1x9 array:
X3 = X.reshape((1,9))
print('\nX reshaped to (1,9):')
print(X3)
print(X3.shape)
X shape: (3, 3)

X reshaped to (9,):
[1 2 3 4 5 6 7 8 9]
(9,)

X reshaped to (1,9):
[[1 2 3 4 5 6 7 8 9]]
(1, 9)

Functions for Building Arrays:¶

In [49]:
# construct a 3x3 identity (i.e. I or "eye") matrix:
print(np.eye(3))

# construct an array of ones:
print(np.ones((2,3)))

# construct an array of 11 equally spaced points from 0 to 1:
x = np.linspace(0.0, 1.0, 11)
print(x)

# construct an array of ones/zeros with the same shape as x:
x_ones = np.ones_like(x)
print(x_ones)
[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]
[[1. 1. 1.]
 [1. 1. 1.]]
[0.  0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1. ]
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

Indexing Arrays¶

  • Numpy arrays can be indexed like Python lists, but with some added features:
    • multiple indices can be selected using comma-separated indexes (e.g. array[1,2,3])
    • The entire subarray along an index can be selected using : instead of an index
In [50]:
X = np.array(range(1,10)).reshape((3,3))
print(X)

# access row 0:
print('\nAccessing X[0]:')
print(X[0])

# access row 0, column 2:
print('\nAccessing X[0,2]:')
print(X[0,2])

# access column 0:
print('\nAccessing X[:,0]:')
print(X[:,0])
[[1 2 3]
 [4 5 6]
 [7 8 9]]

Accessing X[0]:
[1 2 3]

Accessing X[0,2]:
3

Accessing X[:,0]:
[1 4 7]

Operations on Numpy Arrays:¶

  • All math operations on arrays are performed elementwise:
In [53]:
x1 = np.array([1,2,3])
x2 = np.array([4,5,6])

# all math operations are elementwise:
print(x1 + x2)
print(x1 - x2)
print(x1 * x2)
print(np.sqrt(x1))

# scalar operations are also elementwise:
print('\nScalar operations:')
print(x1 - 1)
print(x1*2)
print(-x1)
[5 7 9]
[-3 -3 -3]
[ 4 10 18]
[1.         1.41421356 1.73205081]

Scalar operations:
[0 1 2]
[2 4 6]
[-1 -2 -3]

Matrix Operations¶

  • 1D array can be used to represent a vector,

  • 2D arrays can be used to represent a matrix

  • Numpy supports many different matrix and vector operations from the linalg package:

In [55]:
# generate a matrix:
A = np.array(range(1,10)).reshape(3,3)
print(A)

# generate a diagonal matrix:
X = np.diag([1,10,100])
print(X)

# matrix transpose:
print(A.T)

# matrix inverse:
print(np.linalg.inv(X))
[[1 2 3]
 [4 5 6]
 [7 8 9]]
[[  1   0   0]
 [  0  10   0]
 [  0   0 100]]
[[1 4 7]
 [2 5 8]
 [3 6 9]]
[[1.   0.   0.  ]
 [0.   0.1  0.  ]
 [0.   0.   0.01]]

Linear Algebra:¶

  • Matrix-Matrix and Matrix-vector product can be computed using the reserved @ operator:
In [60]:
# compute a matrix-matrix product:
print('\nA @ X:')
print(A @ X)

# compute a matrix-vector product:
b = np.array([1,10,100])
print('\nA @ b:')
print(A @ b)

# compute a vector dot product (inner product):
print('\nb • b:')
print(np.dot(b,b))

# compute vector norm:
print('\n|b|:')
print(np.linalg.norm(b))
A @ X:
[[  1  20 300]
 [  4  50 600]
 [  7  80 900]]

A @ b:
[321 654 987]

b • b:
10101

|b|:
100.50373127401788

The Scipy Package¶

  • Scipy provides many useful subpackages for scientific computing
  • Subpackages you may find useful include:
    • scipy.constants: physical constants, unit conversions
    • scipy.optimize: functions for optimization and root finding
    • scipy.integrate: functions numerical integration
    • scipy.stats: statistical analysis functions
    • scipy.special: special functions (e.g. Bessel functions, Ganna function, etc.)

Scientific Constants¶

In [62]:
from scipy import constants
In [63]:
print(constants.pi)             # 3.141592653589793 
print(constants.speed_of_light) # 299792458.0 [m/s]
print(constants.Avogadro)       # 6.02214076e+23 [mol^(-1)]
print(constants.G)              # Gravitational constant [m^3 / kg s^2 ] 
print(constants.Boltzmann)      # Boltzmann constant [J/K]
print(constants.m_e)            # Electron mass [kg]
3.141592653589793
299792458.0
6.02214076e+23
6.6743e-11
1.380649e-23
9.1093837015e-31

Integration¶

  • We can integrate functions over a specific domain using scipy.integrate.quad.

    • Example: integrating the Gaussian function:

    $$\int_{-\infty}^\infty e^{-x^2}\ dx = \sqrt{\pi}$$

In [67]:
import numpy as np
from scipy.integrate import quad
In [76]:
# Gaussian function:
def gauss(x):
    return np.exp(-x**2)

# integrate from -10^3 to +10^3 (close enough to infinity):
integral, error = quad(gauss, -1e3, 1e3)

# compare numerical result with actual result:
print(integral)
print(np.sqrt(np.pi))
print(error)
1.7724538509055159
1.7724538509055159
7.768296244985151e-09

Optimization¶

  • We can use the scipy.optimize subpackage to attempt to find the global or local minima of a Python function
  • This package is also useful for fitting curves to data
    • See scipy.optimize.curve_fit

Example Fitting linear data to a function:¶

  • To test scipy's curve_fit function, let's create some linear data with a bit of random noise:
In [73]:
import numpy as np
from scipy.optimize import curve_fit

# generate N datapoints that fit y = 10x + 3:
N = 10
x_data = np.linspace(0,10,N)
y_noise = np.random.normal(size=N) 
y_data = 10*x_data + 3 + y_noise
In [74]:
# linear model of the form y = ax + b:
def linear_model(x, a, b):
    return a*x + b

# fit the linear model to data:
initial_guess = (0,0)
p_opt, p_cov = curve_fit(linear_model,x_data, y_data, initial_guess)

# print the estimated a and b coefficients:
print('Estimated a:', p_opt[0], '(should be close to 10.0)')
print('Estimated b:', p_opt[1], '(should be close to 3.0)')
Estimated a: 9.969127086826829 (should be close to 10.0)
Estimated b: 2.9155794406611073 (should be close to 3.0)

Visualize fit with Matplotlib¶

In [75]:
import matplotlib.pyplot as plt

# evaluate best fit line at many points:
x_fit = np.linspace(0,10,1000)
y_fit = linear_model(x_fit, p_opt[0], p_opt[1])

# plot data and fitted line:
plt.figure()
plt.scatter(x_data,y_data)
plt.plot(x_fit, y_fit, 'r:')
plt.show()

Special Functions:¶

  • Special mathematical functions can be found in the scipy.special subpackage:

  • Airy functions: scipy.special.airy

  • Bessel functions: scipy.special.jv

  • Spherical Bessel functions: scipy.special.yn

  • Gamma function: scipy.special.gamma

  • Riemann zeta function: scipy.special.zeta

Recommended Reading:¶

  • Data Handling and Visualization

If possible, try to do the exercises. Bring your questions to our next meeting on Monday.