Materials+ML Workshop Day 5¶

logo

Day 5 Agenda:¶

  • Questions about Day 4 Material
  • Review of Day 4

Content for today:

  • The Atomic Simulation Environment
    • Building atomic structures
    • Visualizing atomic structures
  • Python Materials Genomics (pymatgen with ase)
    • Running and interpreting calculations
    • Visualizing properties (band structure)
  • Using the Materials Project Database
    • Querying material properties
    • Getting crystal structure data

The Workshop Online Book:¶

https://cburdine.github.io/materials-ml-workshop/¶

Tentative Workshop Schedule:¶

Session Date Content
Day 0 06/16/2023 (2:30-3:30 PM) Introduction, Setting up your Python Notebook
Day 1 06/19/2023 (2:30-3:30 PM) Python Data Types
Day 2 06/20/2023 (2:30-3:30 PM) Python Functions and Classes
Day 3 06/21/2023 (2:30-3:30 PM) Scientific Computing with Numpy and Scipy
Day 4 06/22/2023 (2:30-3:30 PM) Data Manipulation and Visualization
Day 5 06/23/2023 (2:30-3:30 PM) Materials Science Packages
Day 6 06/26/2023 (2:30-3:30 PM) Introduction to ML, Supervised Learning
Day 7 06/27/2023 (2:30-3:30 PM) Regression Models
Day 8 06/28/2023 (2:30-3:30 PM) Unsupervised Learning
Day 9 06/29/2023 (2:30-3:30 PM) Neural Networks
Day 10 06/30/2023 (2:30-3:30 PM) Advanced Applications in Materials Science

Questions¶

Material covered yesterday:

  • Pandas
  • Matplotlib

Pandas¶

  • Pandas is an open-source Python package for data manipulation and analysis.
  • It can be used for reading writing data to several different formats including:
    • CSV (comma-separated values)
    • Excel spreadsheets
    • SQL databases
  • We can import pandas as follows:
In [1]:
import pandas as pd

DataFrames¶

  • We can create Dataframes from Python dictionaries as follows:
In [2]:
# Data on the first four elements of the periodic table:
elements_data = {
    'Element' : ['H', 'He', 'Li', 'Be'],
    'Atomic Number' : [ 1, 2, 3, 4 ],
    'Mass' : [ 1.008, 4.002, 6.940, 9.012],
    'Electronegativity' : [ 2.20, 0.0, 0.98, 1.57 ]
}

# construct dataframe from data dictionary:
df = pd.DataFrame(elements_data)
In [3]:
import numpy as np

# get the 'Mass' column and convert it to a numpy array:
mass_series = df['Mass']
mass_array = np.array(mass_series)

print(mass_array)
[1.008 4.002 6.94  9.012]
In [4]:
# add an "Estimated Mass" column to the dataframe:
df['Estimated Mass'] = \
    df['Atomic Number'].apply(lambda n : 2*n)
display(df)
Element Atomic Number Mass Electronegativity Estimated Mass
0 H 1 1.008 2.20 2
1 He 2 4.002 0.00 4
2 Li 3 6.940 0.98 6
3 Be 4 9.012 1.57 8

Matplotlib¶

  • Matplotlib is a MATLAB-like plotting utility for creating publication-quality plots
  • In matplotlib, we typically import the pyplot subpackage with the alias plt:
In [5]:
import matplotlib.pyplot as plt
In [16]:
# generate some data:
data_x = np.linspace(0,8,10)
data_y = np.sin(data_x)

# create a new figure and plot data:
plt.figure(figsize=(7,2))
plt.plot(data_x, data_y, 'ro--')

# add a title and show plot in notebook:
plt.title('Example of a Line plot')
plt.show()

New Content:¶

  • Materials Science Python Packages:
    • ASE (Atomic Simulation Environment)
    • Pymatgen (Python Materials Genomics)
    • Materials Project API

Installing packages:¶

  • Install ASE:
pip install ase
  • Install ASE:
pip install pymatgen
  • Install Materials Project API
pip install mp-api
In [25]:
pip install ase pymatgen mp-api
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: mp-api in /home/colin/.local/lib/python3.10/site-packages (0.33.3)
Requirement already satisfied: msgpack in /usr/lib/python3/dist-packages (from mp-api) (1.0.3)
Requirement already satisfied: typing-extensions>=3.7.4.1 in /home/colin/.local/lib/python3.10/site-packages (from mp-api) (4.2.0)
Requirement already satisfied: pymatgen>=2022.3.7 in /home/colin/.local/lib/python3.10/site-packages (from mp-api) (2023.5.31)
Requirement already satisfied: monty>=2021.3.12 in /home/colin/.local/lib/python3.10/site-packages (from mp-api) (2023.5.8)
Requirement already satisfied: emmet-core>=0.54.0 in /home/colin/.local/lib/python3.10/site-packages (from mp-api) (0.57.1)
Requirement already satisfied: setuptools in /usr/lib/python3/dist-packages (from mp-api) (59.6.0)
Requirement already satisfied: requests>=2.23.0 in /usr/lib/python3/dist-packages (from mp-api) (2.25.1)
Requirement already satisfied: spglib>=2.0.1 in /home/colin/.local/lib/python3.10/site-packages (from emmet-core>=0.54.0->mp-api) (2.0.2)
Requirement already satisfied: pybtex~=0.24 in /home/colin/.local/lib/python3.10/site-packages (from emmet-core>=0.54.0->mp-api) (0.24.0)
Requirement already satisfied: pydantic>=1.10.2 in /home/colin/.local/lib/python3.10/site-packages (from emmet-core>=0.54.0->mp-api) (1.10.9)
Requirement already satisfied: ruamel.yaml>=0.17.0 in /home/colin/.local/lib/python3.10/site-packages (from pymatgen>=2022.3.7->mp-api) (0.17.32)
Requirement already satisfied: networkx>=2.2 in /home/colin/.local/lib/python3.10/site-packages (from pymatgen>=2022.3.7->mp-api) (2.8.8)
Requirement already satisfied: palettable>=3.1.1 in /home/colin/.local/lib/python3.10/site-packages (from pymatgen>=2022.3.7->mp-api) (3.3.3)
Requirement already satisfied: scipy>=1.5.0 in /usr/lib/python3/dist-packages (from pymatgen>=2022.3.7->mp-api) (1.8.0)
Requirement already satisfied: sympy in /usr/lib/python3/dist-packages (from pymatgen>=2022.3.7->mp-api) (1.9)
Requirement already satisfied: tqdm in /home/colin/.local/lib/python3.10/site-packages (from pymatgen>=2022.3.7->mp-api) (4.64.0)
Requirement already satisfied: numpy>=1.20.1 in /usr/lib/python3/dist-packages (from pymatgen>=2022.3.7->mp-api) (1.21.5)
Requirement already satisfied: plotly>=4.5.0 in /home/colin/.local/lib/python3.10/site-packages (from pymatgen>=2022.3.7->mp-api) (5.15.0)
Requirement already satisfied: uncertainties>=3.1.4 in /home/colin/.local/lib/python3.10/site-packages (from pymatgen>=2022.3.7->mp-api) (3.1.7)
Requirement already satisfied: pandas in /home/colin/.local/lib/python3.10/site-packages (from pymatgen>=2022.3.7->mp-api) (1.4.4)
Requirement already satisfied: tabulate in /home/colin/.local/lib/python3.10/site-packages (from pymatgen>=2022.3.7->mp-api) (0.9.0)
Requirement already satisfied: matplotlib>=1.5 in /usr/lib/python3/dist-packages (from pymatgen>=2022.3.7->mp-api) (3.5.1)
Requirement already satisfied: packaging in /usr/lib/python3/dist-packages (from plotly>=4.5.0->pymatgen>=2022.3.7->mp-api) (21.3)
Requirement already satisfied: tenacity>=6.2.0 in /home/colin/.local/lib/python3.10/site-packages (from plotly>=4.5.0->pymatgen>=2022.3.7->mp-api) (8.2.2)
Requirement already satisfied: PyYAML>=3.01 in /usr/lib/python3/dist-packages (from pybtex~=0.24->emmet-core>=0.54.0->mp-api) (5.4.1)
Requirement already satisfied: six in /usr/lib/python3/dist-packages (from pybtex~=0.24->emmet-core>=0.54.0->mp-api) (1.16.0)
Requirement already satisfied: latexcodec>=1.0.4 in /home/colin/.local/lib/python3.10/site-packages (from pybtex~=0.24->emmet-core>=0.54.0->mp-api) (2.0.1)
Requirement already satisfied: ruamel.yaml.clib>=0.2.7 in /home/colin/.local/lib/python3.10/site-packages (from ruamel.yaml>=0.17.0->pymatgen>=2022.3.7->mp-api) (0.2.7)
Requirement already satisfied: future in /home/colin/.local/lib/python3.10/site-packages (from uncertainties>=3.1.4->pymatgen>=2022.3.7->mp-api) (0.18.3)
Requirement already satisfied: python-dateutil>=2.8.1 in /home/colin/.local/lib/python3.10/site-packages (from pandas->pymatgen>=2022.3.7->mp-api) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /usr/lib/python3/dist-packages (from pandas->pymatgen>=2022.3.7->mp-api) (2022.1)
Note: you may need to restart the kernel to use updated packages.

The Atomic Simulation Environment¶

  • ASE is a Python package for building, manipulating, and performing calculations on atomic structures
  • ASE provides interfaces to several different simulation platforms, such as:
    • VASP
    • Quantum ESPRESSO
    • Q-Chem
    • Gaussian

ASE Basics:¶

  • The fundamental data type in ASE is the Atoms object:

    • Atoms represents a collection of Atoms in a molecular or crystalline structure.
    • Material properties (such as the results of calculations) can be attached to Atoms instances
  • ASE has functionality for loading, exporting and viewing Atoms objects in different formats.
  • Example: Acetic Acid Molecule
In [31]:
from ase.build import molecule
from ase.visualize import view

# build and view a common molecule:
acetic_acid = molecule('CH3COOH')
view(acetic_acid, viewer='x3d')
Out[31]:
ASE atomic visualization
In [35]:
for atom in acetic_acid:
    print(atom.symbol)
    print(atom.position, '(Ang.)')
C
[0.      0.15456 0.     ] (Ang.)
O
[0.166384 1.360084 0.      ] (Ang.)
O
[-1.236449 -0.415036  0.      ] (Ang.)
H
[-1.867646  0.333582  0.      ] (Ang.)
C
[ 1.073776 -0.892748  0.      ] (Ang.)
H
[ 2.048189 -0.408135  0.      ] (Ang.)
H
[ 0.968661 -1.528353  0.881747] (Ang.)
H
[ 0.968661 -1.528353 -0.881747] (Ang.)

Building Inorganic Structures:¶

In [78]:
from ase.build import bulk

# build Face-Centered Cubic Silicon:
# (lattice constant of 5.431 Å)
silicon = bulk('Si', a=5.431)
In [79]:
view(silicon, viewer='x3d', repeat=(3,3,3))
Out[79]:
ASE atomic visualization

Tutorial: Building MXenes¶

  • Building a monolayer Ti2C crystal

Tutorial: Carbon Allotropes¶

  • Building Carbon nanotubes

The Materials Project¶

  • Register for an account at https://next-gen.materialsproject.org/
  • Once you have registered, find your API key under My Dashboard:

mp api key

Tutorial: Working with the Materials Project¶

  • We will examine the material

YBa$_1$Cu$_2$O$_{7-\delta}$ (YBCO)

  • YBCO is a high-temperature superconductor
  • We will:
    • Visualize the crystal structure
    • Plot the Band Structure
    • Visualize the Density of States

Questions¶

Recommended Reading:¶

  • Introduction to Machine Learning
  • Supervised Learning

If possible, try to do the exercises. Bring your questions to our next meeting (next Monday).

In [ ]: