4.2 PubChem 3D#

PubChem3D#

The PubChem3D Project is an initiative to provide computed 3D structures and conformer models for millions of small molecules in the PubChem database. Using structure generation algorithms and energy minimization tools, PubChem3D generates multiple low-energy conformers for each compound to better reflect their potential shapes in biological systems. These 3D models are valuable for structure-based drug design, molecular docking, and cheminformatics research, offering users standardized, accessible 3D information directly from PubChem’s web services and programmatic (PUG REST) interfaces.

Each computed 3D conformer is not necessarily at a global or local energy minimum and may not represent the lowest-energy form in vacuum, solvent, or within a binding site. Instead, the 3D conformers are computationally generated as part of a conformer model as a set of diverse, low-energy structures that reflect the molecule’s conformational flexibility. These conformers are sampled using average atom pairwise RMSD (root mean squared distance) thresholds to capture a range of energetically accessible and potentially biologically relevant shapes.

For each molecule up to 500 conformers are made per molecule, however, this is too much data to be made available for PubChem services. Therefore, a maximum of 10 diverse conformations are available for download. These first 10 conformers are ordered in such that they represent the overall diversity of the model of the compound.

Learning Objectives

  • Explore PubChem3D Compound Records:

    • The web interface

    • Obtaining SDF data through PUG REST

  • Explore ipywidgets

    • Create dropdown menus

    • Use Hbox to display two outputs side by side

  • Review PubChem’s PUG REST Web Interface

    • Getting data from PubChem through PUG-REST

  • Practice code from previous notebooks

Accessing data through the compound summary page#

To access 3D conformer data for a compound on PubChem, navigate to the compound’s summary page and scroll to Section 1.3: Structure. Within this section, you’ll find options to view the 2D and 3D structures of the molecule. If 3D conformers are available, a “3D Conformer” viewer will be embedded, allowing interactive visualization directly in the browser. A dropdown or slider may be present to explore different conformers. Links are provided to download conformers in formats such as SDF, JSON, or XYZ.

Click on the link below to view 3D conformers of Atorvastatin. https://pubchem.ncbi.nlm.nih.gov/compound/60823#section=3D-Conformer

Explore the web interface

In the browswer window for the 3D conformer section:

  • confirm there are 10 publicly available conformers

  • change the model structure for viewing

  • Display the SDF file for conformer 8 in a new web browser window

  • In the SDF file, identify the <PUBCHEM_CONFORMER_DIVERSEORDER> section. These are the conformer IDs for each molecule. The first 10 are the ones that are publicly available for download.

Do the conformations look different?

Another interactive web-based tool is the PubChem 3D Viewer Users can load conformers directly by CID (Compound ID) or conformer ID, and can choose among available conformers for a compound to compare different low-energy 3D geometries.

Click on the PubChem 3D viwer link above.

Explore the PubChem 3D Viewer
  • In the browser window for the PubChem 3D Viewer:

    • Add CID 60823 (Atorvastatin) to the CID list. Click View at the bottom of the page.

    • Confirm there are 10 conformers to view

  • Click here to view the conformation superposition tool

    • In the Pairs by: box, choose Conformer ID

    • Enter the following two conformers of atorvastatin 0000ED9700000002 0000ED9700000014

    • Click View

    • On the left side of the viewer confirm you have Reference of LID 2 and Fit of LID 20 (LID = Local Conformer Identifier).

    • We won’t go into detail until later this semester, but make note of the Shape and Feature similarity percentages. They will give us a way of assessing how similar the 3D shape and pharmacophoric elements align.

    • Go back to the conformation superposition tool and enter this conformation pair 0000ED9700000002 0000ED970000000E

    • Make note of the Shape and Feature similarity percentages for LID 2 and LID 14.

    • Download the superposition of these two molecules as SDF.

Can you conclude that the conformers are different from this assessment? The only 3D viewer only provides a visualization with rocking between set views. Wouldn’t it be nice to view this in a 3D viewer? (foreshadowing to homework)

PubChem provides programmatic access to conformer data through its PUG REST interface, enabling users to retrieve 3D structural information in an automated and reproducible manner. The following code shows how to download the 10 publicly accessible conformers of atorvastatin and display them through py3Dmol.

# New code for getting the 10 publicly available conformers of any CID from PubChem

import pandas as pd
import requests
import py3Dmol
import ipywidgets as widgets
from IPython.display import display, clear_output
pd.set_option('display.max_colwidth', None)  # Ensure full URL is displayed

# Define input data 
cid = 60823 # Atorvastatin's PubChem Compound ID

# Fetch conformers from PubChem 
pugrest = "https://pubchem.ncbi.nlm.nih.gov/rest/pug"
pugin   = "compound/cid/"+str(cid)
pugoper = "conformers"
pugout  = "TXT"
url     = "/".join( [pugrest, pugin, pugoper, pugout] )

res = requests.get(url)
if res.status_code != 200:
    raise Exception(f"Failed to fetch conformers: {res.status_code} {res.reason}")  

# Parse the response text to get conformer IDs

conformers = res.text.splitlines()
conformers = [line.strip() for line in conformers if line.strip()]  

# Helper function to generate conformer URL fpr SDF format for each conformer ID
def make_conformer_url(conformer_id):
    return f"https://pubchem.ncbi.nlm.nih.gov/rest/pug/conformers/{conformer_id}/SDF?response_type=display"

# Build DataFrame with conformer IDs and URLs
data = {
    "Conformer_ID": conformers,
    "Conformer_URL": [make_conformer_url(conf_id) for conf_id in conformers]
}

df = pd.DataFrame(data)

# Display dataframe
display(df)

# Interactive viewer setup with two dropdown widgets side by side to compare conformers

# Dropdowns 
dropdown1 = widgets.Dropdown(
    options=[(f"Conformer {row.Conformer_ID}", i) for i, row in df.iterrows()],
    description='Left:',
    layout=widgets.Layout(width='45%')
)

dropdown2 = widgets.Dropdown(
    options=[(f"Conformer {row.Conformer_ID}", i) for i, row in df.iterrows()],
    description='Right:',
    layout=widgets.Layout(width='45%')
)

# Output containers 
output1 = widgets.Output()
output2 = widgets.Output()

def render_conformer(index, output):
    conf_id = df.loc[index, "Conformer_ID"]
    url = df.loc[index, "Conformer_URL"]
    try:
        sdf_data = requests.get(url).text
        view = py3Dmol.view(width=400, height=400)
        view.addModel(sdf_data, "sdf")
        view.setStyle({'stick': {}})
        view.zoomTo()
        with output:
            clear_output(wait=True)
            print(f"Conformer {conf_id} (Index {index})")
            display(view)
    except Exception as e:
        with output:
            clear_output(wait=True)
            print(f"Failed to load conformer {conf_id}: {e}")


#  Event listeners for dropdowns 
dropdown1.observe(lambda change: render_conformer(change['new'], output1) if change['type'] == 'change' and change['name'] == 'value' else None)
dropdown2.observe(lambda change: render_conformer(change['new'], output2) if change['type'] == 'change' and change['name'] == 'value' else None)

# Display interface 
controls = widgets.HBox([dropdown1, dropdown2])
outputs = widgets.HBox([output1, output2])
display(controls, outputs)

# Initial render
render_conformer(0, output1)
render_conformer(1, output2)

Homework

  1. The code above generates URLs to download SDF files of each conformer and displays in a py3Dmol widget. Explain why conformer data needs to be stored as SDF and not as InChI or SMILES.

Write your explaination to question 1 here.

2) Choose a small molecule of your choice from PubChem (e.g., an over-the-counter drug or a natural product). Identify the CID of your chosen molecule using the PubChem website. Write a short Python script or Jupyter notebook that does the following:
  • Use PUG REST to retrieve the list of conformer IDs for that compound.

  • Download all publicly available conformers in SDF format.

  • Visualize the conformers side by side using py3Dmol, either as individual viewers or using a dropdown or slider to switch between them.

# write your code here
3) Previously you downloaded the superposition of 2 conformations of atorvastatin as an SDF. It probably had a filename of CID_60823_60823_algn.sdf. Load this SDF into a pymol3D viewer to visualize.
# write your code here

Acknowledgments#