Pymatgen and the Materials Project API#

pymatgen.svg

Now that we have learned how to use ASE to build atomic structures, let’s learn about how we can use the Pymatgen and Materials Project API to retrieve known atomic structures from databases and visualize their electronic properties.

Pymatgen is a powerful open-source Python library for materials analysis, designed to interface with electronic structure codes such as VASP and ABINIT. While its functionality is extensive (supporting everything from symmetry analysis to phase diagrams), this tutorial uses Pymatgen primarily as a tool to obtain crystal structures from the Materials Project and convert them to ase.Atoms objects. This interoperability makes it a useful bridge between databases and simulation frameworks.

pymatgen.svg

The Materials Project API allows a user to query information from the Materials Project.

To access their API, the Materials Project has released a Python client in the mp-api package, which provides a simple interface for querying the Materials Project database and retrieving data over the internet. However, to access the Materials Project, you’ll need an API key, which you can get from your account dashboard after logging into materialsproject.org.

Important

An API key is required to use MPRester.

  • You must be logged in on materialsproject.org to obtain your API key.

  • If you do not have a Materials Project account, you can create one for free with a valid email address.

  • You can obtain your personal API key from your Materials Project Dashboard, or you can get it from the documentation page.

  • Your API key is a long alphanumeric string (about 30 characters) that you must use every time you wish to query the Materials Project programmatically via the API.

  • Make sure to never share your API key or accidentally upload code with your API key visible.

Once set up, you can access structured data programmatically—retrieving everything from atomic structures to band gaps and density of states with just a few lines of Python.

Installation#

To use the pymatgen and mp-api packages, we will need to install them using the Python package manager:

pip install pymatgen mp-api

Once installed, you’ll be able to access Pymatgen’s wide suite of materials tools and connect to the Materials Project Database.

Using the MPRester Object#

Let’s get started by connecting the the Materials Project database. First, you will need to copy your API key from your Materials Project Dashboard.

To use your API key in our Python code, we store its value in a string, like this:

# Save your Materials Project API as a string
MP_API_KEY = '---your api key here ----'

After storing your API key in a variable (e.g., MP_API_KEY), you can use it to initialize a connection to the Materials Project with MPRester. This is typically done in a with statement like with MPRester(MP_API_KEY) as mpr:, which opens a session. Inside this block, you can call methods on mpr—such as retrieving structures or material data—using commands like mpr.some_method(). Let’s look at some examples to see how this works in practice.

Get the Crystal Structure of a Specific Material#

In the Materials Project, each material has a unique identifier, known as its Materials Project ID (MPID). When you want information about a single material, it is reasonable to perform a manual search using the Materials Explorer for the material so you can find its MPID. Then, you can use the MPID along with MPRester to automate queries about the material.

As an example, YBa\(_2\)Cu\(_3\)O\(_7\) has mp-20674 as its MPID. We can obtain its crystal structure directly using this MPID as follows:

from mp_api.client import MPRester
import os
import pymatgen as pmg

from ase.io import write

MPID = 'mp-20674' # Materials Project ID number

"""
  The 'with ...' statement defines an MPRester code block. Subsequent
  indented statements belong to the code block, and the object mpr
  may be used within the code block.
"""
with MPRester(MP_API_KEY) as mpr:

    # Get the structure for YBa2Cu3O7
    structure = mpr.get_structure_by_material_id(MPID)
Hide code cell output

The MPRester method get_structure_by_material_id() queries the Materials Project and returns the crystal structure, storing it in an object structure. Having accessed the Materials Project, we no longer require the mpr object. We can now exit the MPRester code block by resetting the indentation.

Next, we provide code to inspect the structure we downloaded. What is its data type? How can we use it?

"""
  Reset the indentation (exith the 'with' block), and examine the
  structure we obtained.
"""
print('\nWhat is the data type of the structure we obtained?')
print(type(structure))

print('\nWhat is the structure we obtained?')
print(structure)
What is the data type of the structure we obtained?
<class 'pymatgen.core.structure.Structure'>

What is the structure we obtained?
Full Formula (Ba2 Y1 Cu3 O7)
Reduced Formula: Ba2YCu3O7
abc   :   3.844668   3.926152  11.823664
angles:  90.000000  90.000000  90.000000
pbc   :       True       True       True
Sites (13)
  #  SP      a    b         c    magmom
---  ----  ---  ---  --------  --------
  0  Ba    0.5  0.5  0.819391        -0
  1  Ba    0.5  0.5  0.180609        -0
  2  Y     0.5  0.5  0.5              0
  3  Cu    0    0    0.646678        -0
  4  Cu    0    0    0.353322        -0
  5  Cu    0    0    0                0
  6  O     0    0.5  0                0
  7  O     0.5  0    0.620653        -0
  8  O     0.5  0    0.379347        -0
  9  O     0    0.5  0.621651        -0
 10  O     0    0.5  0.378349        -0
 11  O     0    0    0.84082          0
 12  O     0    0    0.15918          0

The output of the code above indicates that the data in structure is in a format compatible with pymatgen. pymatgen has a tool to convert the pymatgen structure to an ase object, which we can visualize.

from pymatgen.io.ase import AseAtomsAdaptor as aaa

# convert the structure to an ase.Atoms object
crystal = aaa.get_atoms(structure)

# Make a static visualization
orientation='90x,75y,-9x'
write('YBa2Cu3O7_structure.png', crystal, show_unit_cell=2,
      rotation=orientation)
nanotube.png

Let’s make an interactive visualization of the crystal:

from ase.visualize import view

# Interactive 3D visualization
view(crystal, viewer='x3d')
ASE atomic visualization

Having obtained the crystal structure, we can now use it in a variety of ways:

  • Use it within an atomistic simulation

  • Use ase.io.write() to save the structure in a structure file (*.cif, *.xyz, etc.)

Getting the Band Structure for a Material#

Band structure data provides insight into the electronic properties of materials. By querying with an MPID, you can retrieve a Pymatgen BandStructure object and use BSPlotter to visualize it. This is especially useful for identifying semiconductors, insulators, or materials with interesting electronic behavior like Dirac points or band inversions.

We can use the MPRester class to obtain band structures for a material:

mpid = "mp-149" # this is the MPID for silicon crystal (diamond lattice)

with MPRester(MP_API_KEY) as mpr:
    bs = mpr.get_bandstructure_by_material_id("mp-149")
Hide code cell output

This returns a Pymatgen band structure object, which can be plotted.

from pymatgen.electronic_structure.plotter import BSPlotter

# plot & show the band structure we obtained
plot = BSPlotter(bs).get_plot()
../_images/7a92d278e0a11024304c011a4e67452dc4e3c0ecb16e81f0ab3af008a69763ca.png

Searching using MPRester#

Using the MPRester object, Materials Project data can be queried in two ways:

  • through a specific list of MPID(s), and/or

  • through property filters (e.g. band gap less than 0.5 eV)

Filters can be applied to find materials that meet specific criteria. The search results return structured documents that contain key properties like chemical formula, symmetry, and electronic structure. You can also customize which properties to return by specifying them in the fields argument.

When querying a list of MPIDs, we use the following syntax:

with MPRester(MP_API_KEY) as mpr:
    docs = mpr.materials.summary.search(material_ids=["mp-149", "mp-13", "mp-22526"])
Hide code cell output

Here, each material entry in the Materials Project has summary data, and we are simply searching the summary data using mpr.materials.summary.search(). Since we queried for a list of MPIDs, we store in docs a list of “documents” (formally, a list of MPDataDoc objects).

We can now reference an individual document and extract its properties. We’ll use a for loop to list the MPID and chemical formula for each search hit:

print('Our query returned {0} docs.'.format( len(docs) ))

for idx, mat_doc in enumerate(docs):
    print('Item {0}: MPID = {1} (formula: {2})'.format(idx,
                                                       mat_doc.material_id,
                                                       mat_doc.formula_pretty))
Our query returned 3 docs.
Item 0: MPID = mp-149 (formula: Si)
Item 1: MPID = mp-22526 (formula: LiCoO2)
Item 2: MPID = mp-13 (formula: Fe)

What properties ('material_id', 'formula_pretty', etc.) are available for search in the summary data? We can obtain a list of document properties using the following syntax:

print(mpr.materials.summary.available_fields)
['builder_meta', 'nsites', 'elements', 'nelements', 'composition', 'composition_reduced', 'formula_pretty', 'formula_anonymous', 'chemsys', 'volume', 'density', 'density_atomic', 'symmetry', 'property_name', 'material_id', 'deprecated', 'deprecation_reasons', 'last_updated', 'origins', 'warnings', 'structure', 'task_ids', 'uncorrected_energy_per_atom', 'energy_per_atom', 'formation_energy_per_atom', 'energy_above_hull', 'is_stable', 'equilibrium_reaction_energy_per_atom', 'decomposes_to', 'xas', 'grain_boundaries', 'band_gap', 'cbm', 'vbm', 'efermi', 'is_gap_direct', 'is_metal', 'es_source_calc_id', 'bandstructure', 'dos', 'dos_energy_up', 'dos_energy_down', 'is_magnetic', 'ordering', 'total_magnetization', 'total_magnetization_normalized_vol', 'total_magnetization_normalized_formula_units', 'num_magnetic_sites', 'num_unique_magnetic_sites', 'types_of_magnetic_species', 'bulk_modulus', 'shear_modulus', 'universal_anisotropy', 'homogeneous_poisson', 'e_total', 'e_ionic', 'e_electronic', 'n', 'e_ij_max', 'weighted_surface_energy_EV_PER_ANG2', 'weighted_surface_energy', 'weighted_work_function', 'surface_anisotropy', 'shape_factor', 'has_reconstructed', 'possible_species', 'has_props', 'theoretical', 'database_IDs']

Next, we query using property filters. We apply the following filters:

  • Materials containing Si and O

  • Materials with a band gap no greater than 1.0 eV but no less than 0.5 eV

  • Instead of all available summary fields, we’ll only ask for a few: "material_id", "formula_pretty", "band_gap".

with MPRester(MP_API_KEY) as mpr:
    docs = mpr.materials.summary.search(
        elements=["Si", "O"],
        band_gap=(0.5, 0.75),
        fields=[
            "material_id", 
            "formula_pretty",
            "band_gap"
        ])

example_doc = docs[0]
# initial_structures = example_doc.initial_structures
Hide code cell output

To see what our search turned up, we can use some simple code, like this. We first find out how many hits our query returned using len(docs), and then we print only the first N hits, where we set N = 10.

N = 10
print('Our query returned {0} docs.'.format( len(docs) ))
print(f'Printing only the first {N} results:')

for idx in range(0,N):
    mat_doc = docs[idx]
    print('Item {0}: MPID = {1} ({2}), band gap = {3:6.4f} eV'.format(idx,
                                             mat_doc.material_id,
                                             mat_doc.formula_pretty,
                                             mat_doc.band_gap))
Our query returned 244 docs.
Printing only the first 10 results:
Item 0: MPID = mp-640917 (SiO2), band gap = 0.6554 eV
Item 1: MPID = mp-554498 (SiO2), band gap = 0.5051 eV
Item 2: MPID = mp-1192103 (FeSiO4), band gap = 0.6887 eV
Item 3: MPID = mp-1199561 (FeSiO3), band gap = 0.7031 eV
Item 4: MPID = mp-725556 (Pu2SiO10), band gap = 0.7104 eV
Item 5: MPID = mp-28195 (SiAg2O3), band gap = 0.6373 eV
Item 6: MPID = mp-1016879 (CdSiO3), band gap = 0.6506 eV
Item 7: MPID = mp-754234 (V2SiO4), band gap = 0.6219 eV
Item 8: MPID = mp-1143241 (CrSiO4), band gap = 0.7340 eV
Item 9: MPID = mp-1205049 (SiAg5O4), band gap = 0.7138 eV

Exercises#

Exercise 1: Visualize the DOS of YBa\(_2\)Cu\(_3\)O\(_7\)

Let’s get some hands-on experience with accessing and plotting electronic structure data, a critical step in evaluating materials for electronic applications in real-world technologies.

First, obtain the electronic density of states (DOS) for YBa\(_2\)Cu\(_3\)O\(_7\). Then plot it using the functionality available in pymatgen.


Hints:

  1. To find the code to obtain the DOS, a Google search such as “MPRester DOS example” may help, or perhaps you can try asking an AI chatbot.

  2. Once you have obtained a DOS and saved it as, say, some_DOS, you can plot it using code such as this:

from pymatgen.electronic_structure.plotter import DosPlotter
import matplotlib.pyplot as plt

with MPRester(MP_API_KEY) as mpr:
    some_DOS = <your code to get the DOS>

# obtain a DosPlotter object
Plotter =  DosPlotter()

# add the DOS to the plotter
Plotter.add_dos('DOS', some_DOS)

"""
   Choose appropriate numbers for:

       E_lo and E_hi, the upper and lower limits
           of the domain for your DOS plot.

       MaxDensity, the upper limit for the range of
           your DOS plot.

   This may require some trial and error!
"""
Plotter.get_plot(xlim=(E_lo, E_hi), ylim=(0, MaxDensity))
plt.show()

Solutions#

Exercise 1: YBCO DOS#

Hide code cell content
from mp_api.client import MPRester # client for Materials Project
from pymatgen.electronic_structure.plotter import DosPlotter
import matplotlib.pyplot as plt

YBCO = 'mp-20674' # Materials Project ID number

with MPRester(MP_API_KEY) as mpr:
    YBCO_DOS = mpr.get_dos_by_material_id(YBCO)

print(YBCO_DOS)

# plot & show DOS we obtained
Plotter =  DosPlotter()

Plotter.add_dos('DOS', YBCO_DOS)

Plotter.get_plot(xlim=(-10, 10), ylim=(0, 30))
plt.show()
Complete DOS for Full Formula (Ba2 Y1 Cu3 O7)
Reduced Formula: Ba2YCu3O7
abc   :   3.844668   3.926152  11.823664
angles:  90.000000  90.000000  90.000000
pbc   :       True       True       True
Sites (13)
  #  SP      a    b         c
---  ----  ---  ---  --------
  0  Ba    0.5  0.5  0.819391
  1  Ba    0.5  0.5  0.180609
  2  Y     0.5  0.5  0.5
  3  Cu    0    0    0.646678
  4  Cu    0    0    0.353322
  5  Cu    0    0    0
  6  O     0    0.5  0
  7  O     0.5  0    0.620653
  8  O     0.5  0    0.379347
  9  O     0    0.5  0.621651
 10  O     0    0.5  0.378349
 11  O     0    0    0.84082
 12  O     0    0    0.15918
../_images/ab939efc196c7ba9cf7295809f218963806f26022742b62c2582bbbf8a2ad5c3.png