1.3 Python Intermediate#

Introduction to Python#

This notebook borrows heavily from Chapter 2: Intermediate Python from Charlie Weiss’s book Scientific Computing for Chemists with Python.

## Learning Objectives
  • Explore Python data structures:

    • lists

    • tuples

    • sets

    • dictionaries

  • Explore Python Error Types

    • NameError

    • SyntaxError

    • KeyError and IndexError

    • TypeError

    • IndentationError

  • Explore More Python Modules:

    • os

    • random

    • enumerate

    • time

  • Practice code from previous notebook

Python Data Structures#

Python uses several built-in data structures to organize and store information. You were introduced to the list data structure in the last notebook. The four main types are:

  • Lists

  • Tuples

  • Dictionaries

  • Sets

Lists#

A sequence is an object that stores multiple data items in a contiguous manner. Two types of sequences are strings and lists. Lists are special variables that store multiple values. Each value stored in a list is called an element or item. The following are examples of lists:

digits = [0,1,2,3,4,5,6,7,8,9]
compoundClass = ["alkanes", "alkenes", "alcohols", "ketones", "alkyl halides"]
elementalData = ['hydrogen', 1, 1.008, 'helium', 2, 4.00, 'lithium', 3, 6.94]

Key characteristics of lists:

  • Ordered: Items are kept in the order you add them.

  • Mutable: You can change, add or remove items after the list is created.

  • Heterogenous: It can contain different types of data.

The location of an element or item in a list is its index number. Index numbers begin at 0. A list having 5 elements will have index values from 0 to 4. The syntax for accessing the elements of a list is the bracket operator. The expression inside the brackets specifies the index.

print(compoundClass[3],type(compoundClass[3]))
print()
print(digits[1],type(digits[1]))
print()
for i in elementalData:
    print(i, type(i))
ketones <class 'str'>

1 <class 'int'>

hydrogen <class 'str'>
1 <class 'int'>
1.008 <class 'float'>
helium <class 'str'>
2 <class 'int'>
4.0 <class 'float'>
lithium <class 'str'>
3 <class 'int'>
6.94 <class 'float'>

In the last notebook, you used the append() method to add to your lists:

print(len(compoundClass))
compoundClass.append("ethrs")
print(len(compoundClass))
print(compoundClass[-1])
5
6
ethrs

Notice we have a misspelled ethers in the previous example as ethrs. Let’s remove it.

Both pop() and remove() are used to delete elements from a list, but behave differently.

Method

What it does

Returns a value?

Deletes by…

When used?

pop(index)

Removes and returns an item

Yes

Index

when you know the index and want the value back

remove(item)

Removes the first matching value

No

Value

when you know the value but not the index

print(compoundClass[-1])
compoundClass.pop(-1)
ethrs
'ethrs'
# example of removing an item from a list using pop()
solvents = ["methanol", "ethanol", "acetone", "dichloromethane", "ethanol"]
removed_solvent = solvents.pop(2)
print("Removed solvent:", removed_solvent)
print("Updated list:", solvents)

# example of removing an item from a list using remove()
solvents.remove("ethanol") 
print("List after removing ethanol:", solvents)
Removed solvent: acetone
Updated list: ['methanol', 'ethanol', 'dichloromethane', 'ethanol']
List after removing ethanol: ['methanol', 'dichloromethane', 'ethanol']

There are many other list functions. Let’s use a few more:

originalList = ["carbon", "oxygen", "hydrogen", "nitrogen", "oxygen", "sulfur", "hydrogen"]
print(originalList)

# create a copy of the original list
elements = originalList.copy()  

# index(value) method to find the first occurrence of a value in a list
pos = elements.index("oxygen")
print("First occurrence of 'oxygen' is at index:", pos) 

# Change the value at a specific index
elements[pos]= "fluorine"
print("Elements after changing 'oxygen' to 'fluorine':", elements)  

# using the count() method to count occurrences of a value in a list
count_oxygen = elements.count("hydrogen")
print("Number of occurrences of 'hydrogen':", count_oxygen)

# using the sort() method to sort a list in ascending order
elements.sort()
print("Sorted elements:", elements)

# using the reverse() method to reverse the order of a list
elements.reverse()
print("Reversed elements:", elements)

# using the clear() method to remove all items from a list
elements.clear()
print("Elements after clear:", elements)
print("Original list remains unchanged:", originalList)
['carbon', 'oxygen', 'hydrogen', 'nitrogen', 'oxygen', 'sulfur', 'hydrogen']
First occurrence of 'oxygen' is at index: 1
Elements after changing 'oxygen' to 'fluorine': ['carbon', 'fluorine', 'hydrogen', 'nitrogen', 'oxygen', 'sulfur', 'hydrogen']
Number of occurrences of 'hydrogen': 2
Sorted elements: ['carbon', 'fluorine', 'hydrogen', 'hydrogen', 'nitrogen', 'oxygen', 'sulfur']
Reversed elements: ['sulfur', 'oxygen', 'nitrogen', 'hydrogen', 'hydrogen', 'fluorine', 'carbon']
Elements after clear: []
Original list remains unchanged: ['carbon', 'oxygen', 'hydrogen', 'nitrogen', 'oxygen', 'sulfur', 'hydrogen']
Check your understanding

The following code cell has a list of drug names. One of the drugs is misspelled.

  • Correct the misspelling using a list function.

  • Sort the list alphabetically

  • remove the duplicates

  • print the updated list

drugs = ["aspirin", "ibuprophen", "acetaminophen", "naproxen", "diclofenac", "acetaminophen"]
# write your code here
Solution pos = drugs.index("ibuprophen")
drugs[pos] = "ibuprofen"
drugs.sort()
drugs.remove("acetaminophen")
print(drugs)

Tuples#

A tuple is an ordered, immutable collection of items. They are used when you want to group related values that should not change. Think of it as a locked list that you don’t want to accidentally modify or that it is reference data.

Key characteristics of tuples:

  • Ordered: Items are kept in the order you add them.

  • Immutable: Values cannot be changed.

  • Indexed: You can reference specific values.

  • Heterogenous: It can contain different types of data.

To create an empty tuple variable: myTuple =()

molecules = [
    ("benzene", 78.11),
    ("toluene", 92.14),
    ("phenol", 94.11)
]

# Print molecular weights
for name, mw in molecules:
    print(f"{name}: {mw} g/mol")
    
    if mw > 90:
        print(f"{name} has a molecular weight greater than 90 g/mol")
    else:
        print(f"{name} has a molecular weight less than or equal to 90 g/mol")
benzene: 78.11 g/mol
benzene has a molecular weight less than or equal to 90 g/mol
toluene: 92.14 g/mol
toluene has a molecular weight greater than 90 g/mol
phenol: 94.11 g/mol
phenol has a molecular weight greater than 90 g/mol
Check your understanding

Why might a tuple be a better choice than a list for storing (name, weight) pairs in the above case?

Solution

Tuples are ideal here because:

The molecule name and molecular weight are tightly linked.

You don’t want these values to change accidentally (immutability protects them).

Sets#

A set is a multi-element data structure, similar to a list, with one key difference: each element can appear only once. Sets are particularly useful when the goal is to keep track of which items are present, rather than how many. For instance, when taking inventory of a chemical stockroom, you might use a set to record the names of all compounds available for experiments. Even if multiple bottles of the same compound exist, the set will store that compound’s name only once (because the focus is on presence, not quantity). In Python, a set looks similar to a list, but it uses curly braces {} instead of square brackets [].

molecules = ["acetone", "ethanol", "acetone", "methanol", "ethanol"]
unique_molecules = set(molecules)
print(unique_molecules)
{'acetone', 'methanol', 'ethanol'}

Some common set methods include:

Method

Description

.add(item)

add a new element to the set

.remove(item)

Removes a specific element from the set

.discard(item)

Removes an item if it exists(no error if missing)

.union(set2)

combines elements of both sets

.intersection(set2)

Get only items in both sets

.difference(set2)

Get elements in set 1 but not in set 2

Insted of using set methods, you can also use operators.

# Two different screenings of drugs
results1 = {"aspirin", "ibuprofen", "naproxen"}  # drug screening 1 results as a set
results2 = {"naproxen", "acetaminophen"}  # drug screening 2 results as a set

# Find common drugs in both groups
common_drugs = results1.intersection(results2)
print("Common drugs in both groups:", common_drugs)
# Find unique drugs in group1
unique_group1 = results1.difference(results2)
print("Unique drugs in group1:", unique_group1)
# Find unique drugs in group2
unique_group2 = results2.difference(results1)
print("Unique drugs in group2:", unique_group2)
set3 = results1.union(results2)
print("All unique drugs from both groups:", set3)
set3.add("diclofenac")
print("All unique drugs after adding diclofenac:", set3)
moleculesList= list(set3)
print("Molecules as a List:", moleculesList)
moleculesList.sort()
print("Sorted Molecules List:", moleculesList)
Common drugs in both groups: {'naproxen'}
Unique drugs in group1: {'aspirin', 'ibuprofen'}
Unique drugs in group2: {'acetaminophen'}
All unique drugs from both groups: {'naproxen', 'acetaminophen', 'ibuprofen', 'aspirin'}
All unique drugs after adding diclofenac: {'naproxen', 'diclofenac', 'acetaminophen', 'ibuprofen', 'aspirin'}
Molecules as a List: ['naproxen', 'diclofenac', 'acetaminophen', 'ibuprofen', 'aspirin']
Sorted Molecules List: ['acetaminophen', 'aspirin', 'diclofenac', 'ibuprofen', 'naproxen']

Dictionaries#

Python dictionaries are a type of multi-element object that store data as key–value pairs, much like how a real dictionary links a word (the key) to its definition (the value). Also known as associative arrays, dictionaries allow users to access values directly using keys without needing to know or rely on the order of the items. One way to think of a dictionary is as a container of named variables, each assigned a specific value.

For example, if you wanted to write a script to calculate the molecular weight of a compound based on its molecular formula, you would need to look up the atomic mass of each element using its chemical symbol. In this case, the element symbol would serve as the key, and the atomic mass as the value.

Similar to a set, a dictionary is written using curly braces {}, however entry is a key:value pair separated by a colon.

Dictionaries do not allow duplicate keys; if a key is repeated, the last assigned value will overwrite the previous one. This makes dictionaries especially useful for storing and quickly retrieving related data in a chemistry context.

# Create a dictionary of common compounds and their molecular weights
molecular_weights = {
    "water": 18.02,
    "ethanol": 46.07,
    "acetone": 58.08,
    "ethanol": 50.00  # Oops! ethanol is repeated
}

# Print the dictionary
print("Molecular Weights:")
print(molecular_weights)
Molecular Weights:
{'water': 18.02, 'ethanol': 50.0, 'acetone': 58.08}

Python Error Types#

In Python, errors are raised when your code breaks the rules of the language. Understanding common error types can help you debug faster and write better code. Below are some of the most frequent errors students encounter.

Note that each of the following will show you the error and stop the notebook from running subsequent code in the cell.

NameError: Using a variable that hasn’t been defined#

A NameError occurs when the code tries to use a variable or function name that hasn’t been defined. This is often caused by a typo in the name, but it can also happen if you’re working in a Jupyter notebook and try to run code before running the cells that define necessary variables.

Often if you have saved your notebook and reopened it to resume where you left off, it’s a good idea to use Run → Run All Cells from the top menu to make sure all required code has been executed.

# Example: misspelled variable name
mol_weight = 46.07
print(mol_wieght)  # Typo!
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[11], line 3
      1 # Example: misspelled variable name
      2 mol_weight = 46.07
----> 3 print(mol_wieght)  # Typo!

NameError: name 'mol_wieght' is not defined

SyntaxError: Your code breaks Python’s grammar rules#

A programming language’s syntax is its set of rules for how code must be written—including proper formatting, use of symbols, valid variable names, and more. A SyntaxError occurs when your code violates one of these rules. To help you debug, Python provides an error message that shows the specific line of code with the problem and often includes a pointer (^) to indicate where the issue is likely occurring. This error usually signals a small but important mistake, such as a missing colon, unmatched parentheses, or incorrect indentation.

compoundClass = ["alkanes", "alkenes", "alcohols", "ketones", "alkyl halides"]
for item in compoundClass
    print(item)
  Cell In[12], line 2
    for item in compoundClass
                             ^
SyntaxError: expected ':'

KeyError: Accessing a missing key in a dictionary#

You will get a KeyError you try to access a information in a diction that doesn’t exist.

molecular_weights = {
    "water": 18.02,
    "ethanol": 46.07,
    "acetone": 58.08,
    "methanol": 32.04 
}

print(molecular_weights[dichloromethane])  # KeyError: 'dichloromethane' not in dictionary
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[13], line 8
      1 molecular_weights = {
      2     "water": 18.02,
      3     "ethanol": 46.07,
      4     "acetone": 58.08,
      5     "methanol": 32.04 
      6 }
----> 8 print(molecular_weights[dichloromethane])  # KeyError: 'dichloromethane' not in dictionary

NameError: name 'dichloromethane' is not defined

Tip: Use .get() with a default message to avoid crashing, or an if/else statement

molecular_weights = {
    "water": 18.02,
    "ethanol": 46.07,
    "acetone": 58.08,
    "methanol": 32.04 
}

print(molecular_weights.get("dichlormethane", "Key Not found"))

solvents = ["methanol", "ethanol", "water", "dichloromethane", "acetone"]
for solvent in solvents:
    if solvent in molecular_weights:
        print(f"{solvent} has a molecular weight of {molecular_weights[solvent]} g/mol")
    else:
        print(f"{solvent} is not in the molecular weights dictionary")
Key Not found
methanol has a molecular weight of 32.04 g/mol
ethanol has a molecular weight of 46.07 g/mol
water has a molecular weight of 18.02 g/mol
dichloromethane is not in the molecular weights dictionary
acetone has a molecular weight of 58.08 g/mol

Notice that the code above also shows you can access the dictionary by key and the order you access them does not matter.

IndexError: Accessing a list index that doesn’t exist#

If you try to access an index value that is greater than the number of items in the list, you will get an IndexError.

elements = ["H", "C", "O"]
print(elements[5])  
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[15], line 2
      1 elements = ["H", "C", "O"]
----> 2 print(elements[5])  

IndexError: list index out of range

TypeError: Performing an invalid operation for the data type A TypeError occurs when using the wrong object type for a particular function or application. For example, trying to do a mathematical function on a string. You may see this when you import data from PubChem or other sources. That data is often brought in a as text string.

mass1 = 18.02
mass2 = "18.02"
print(mass1 + mass2) 
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[16], line 3
      1 mass1 = 18.02
      2 mass2 = "18.02"
----> 3 print(mass1 + mass2) 

TypeError: unsupported operand type(s) for +: 'float' and 'str'

Indentation Error: Incorrect indentation (spacing) in the code#

Python uses indentation to define code blocks (for example in making for loops or if statments). If it’s inconsistent or missing, an error is raised.

solvents = ["methanol", "ethanol", "water", "dichloromethane", "acetone"]

for solvent in solvents:
print(solvent, type(solvent))
  Cell In[17], line 4
    print(solvent, type(solvent))
    ^
IndentationError: expected an indented block after 'for' statement on line 3
Check your understanding

Fix each of the code cells above so that they don’t throw an error.

More Python Modules#

Python includes many powerful built-in modules that make it easier to perform everyday tasks. Let’s explore some commonly used tools with examples from a chemistry or computational workflow.

os: Interacting with the Operating System#

The os module allows you to work with the file system—listing files, navigating folders, and managing paths. You might use os to scan or write to folders for chemical data files, experimental logs, or spectral images.

A useful function from the os module is the listdir() method which lists all the files and directories in a folder.

To open a file not in the directory of your Jupyter notebook, you will need to change the directory Python is currently looking in, known as the current working directory, using the chdir() method. It takes a single string argument of the path in string format to the folder containing the files of interest. The exact format will vary depending upon your computer and if you are using macOS, Windows, or Linux.

If you are not sure which directory is the current working directory, you can use the getcwd() function. It does not require any arguments.

If you are creating files and want to save them into a specific relative location folder, you can check if the folder exists using os.path.exists(folder_name) and os.makedirs(folder_name) to create the folder.

import os
# Example: list all files in the current directory
files = os.listdir()
print("Files in this folder:", files)

# Example: get the current working directory
cwd = os.getcwd()
print("Current working directory:", cwd)

# Example: change the current working directory
# os.chdir('/path/to/directory')  # Uncomment and set the path to change directory

# Example: Check if directory exists and create a new directory
#folder_name = "new_directory"
#if not os.path.exists(folder_name):
#    os.makedirs(folder_name)  # Safe recursive folder creation
#    print(f"Created directory: {folder_name}")
Files in this folder: ['01-2-python-basics.ipynb', '.ipynb_checkpoints', '01-3-python-intermediate.ipynb:Zone.Identifier', '01-1-OLCC-Primer.ipynb', '01-3-python-intermediate.ipynb', 'README.md:Zone.Identifier', 'README.md', '01-1-OLCC-Primer.ipynb:Zone.Identifier', '01-2-python-basics.ipynb:Zone.Identifier']
Current working directory: /home/rebelford/jupyterbooks/datachem2025book/content/modules/01-OLCC-Primer

random – Generating Random Numbers or Choices#

The random module provides a selection of functions for generating random values. Random values can be integers or floats and can be generated from a variety of ranges and distributions. Note: Brackets mean inclusive while the parentheses mean exclusive. [0,1) will provide values of 0 up to but excluding 1 and is used for probability simulations. [1,10] will provide values of 1 up to and including 10.

Function

Description

random.random()

Generates a value from [0,1)

random.randint(x,y)

Generates a random integer in the specified range inclusive of both endpoints

random.randrange(x,y,z)

Generates an integer from the provided range [x, y) with an optional step to define intervals

random.uniform(x,y)

Generates a float from the range [x, y) with a uniform probability

random.choice()

Randomly selects an item from a list, tuple or other multi-element object

import random

print("generate a random float between 0 and 1")
random_float = random.random()
print("Random float:", random_float)
print()
print("generate a random integer between 1 and 10 including 1 and 10")
for i in range(5):
    random_number = random.randint(1, 10)
    print("Random number:", random_number)
print()

print("generate a random integer between 1 and 10 selecting only even numbers")
for i in range(5):
    random_number = random.randrange(0, 10, 2)
    print("Random number:", random_number)

print()
print("randomly select a compound from a list")
compounds = ["ethanol", "acetone", "toluene", "chloroform"]
print(compounds)
selected = random.choice(compounds)
print("Randomly selected compound:", selected)
generate a random float between 0 and 1
Random float: 0.8864023976443729

generate a random integer between 1 and 10 including 1 and 10
Random number: 7
Random number: 8
Random number: 8
Random number: 3
Random number: 3

generate a random integer between 1 and 10 selecting only even numbers
Random number: 6
Random number: 6
Random number: 2
Random number: 4
Random number: 0

randomly select a compound from a list
['ethanol', 'acetone', 'toluene', 'chloroform']
Randomly selected compound: toluene

Controlling Randomness with random.seed()#

Randomness in Python allows programs to simulate unpredictability.This is accomplished with a pseudorandom number generator, which means that the sequence of random numbers appear random, but is actually an algorithm for generating a sequence of numbers that approximate sequences of random numbers. The sequence is not truly random because it is completely determined by an initial value called the seed. By using random.seed(), you can set the starting point of this sequence, ensuring that your random operations (like sampling or simulation) produce the same results every time. This makes your code reproducible and easier to debug or share with others.

import random

compounds = ["ethanol", "acetone", "toluene", "methanol", "chloroform","dichloromethane", "benzene", "hexane", "cyclohexane"]
print("Without setting a seed we should get different results:")
print("First run:", random.choice(compounds))
print("Second run:", random.choice(compounds))
print("Third run:", random.choice(compounds))
print()
# Setting a seed for reproducibility
print("With setting a seed we should get the same results:")
random.seed(42)  # Set the seed value
print("First run:", random.choice(compounds))

random.seed(42)  # Reset the seed to get the same result
print("Second run:", random.choice(compounds))

random.seed(42)  # Reset the seed to get the same result
print("Third run:", random.choice(compounds))

random.seed(123)  # Set a new random seed
print("Fourth run with new random seed:", random.choice(compounds))
Without setting a seed we should get different results:
First run: hexane
Second run: toluene
Third run: acetone

With setting a seed we should get the same results:
First run: acetone
Second run: acetone
Third run: acetone
Fourth run with new random seed: ethanol

enumerate() – Looping with Indexes#

The enumerate() function lets you loop through a list while keeping track of each item’s index. This is especially useful when you want to number items, label experimental runs, or access both position and content during a loop.

the basic syntax is: for index, item in enumerate(iterable, start=0):
# index is the position
# item is the actual element
where:

  • iterable is the list or other object you are looping over

  • start is optional, but by default is 0, but you can change it

# Example: print index and name of each solvent
solvents = ["ethanol", "acetone", "toluene", "methanol", "chloroform","dichloromethane", "benzene", "hexane", "cyclohexane"]
for index, solvent in enumerate(solvents):
    print(f"{index + 1}. {solvent}")  # add one to index for human-readable numbering
1. ethanol
2. acetone
3. toluene
4. methanol
5. chloroform
6. dichloromethane
7. benzene
8. hexane
9. cyclohexane

Getting Data from the Web with requests#

The requests module allows Python to send HTTP requests to websites and APIs and get back data—often in text or other formats that we will explore. This is essential for working with chemical databases, pulling compound info, literature, or real-time data from public sources.

Common requests Methods

Method

Purpose

get(url)

Retrieve data from a website

post(url)

Submit data to a website (e.g. forms, uploads)

status_code

Check if the request was successful (200 = OK)

text

Response content as a string

json()

Parses JSON responses into Python dictionaries

Some common returned status_code values

Code

Meaning

Description

200

OK

The request was successful and the data was returned

201

Created

A new resource was successfully created (e.g., via POST)

400

Bad Request

The request was malformed (maybe a typo)

404

Not Found

The resource (e.g., page or data) doesn’t exist

429

Too many requests

You’ve hit the rate limit

500

Internal server error

The server had an error while processing the request

import requests

response = requests.get("https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/name/water/property/MolecularFormula/txt")
print("Status code:", response.status_code)

if response.status_code == 200:
    print(f"The molecular formula for water is:", response.text)
else:
    print("Request failed:", response.status_code)
print()

# example with typo for water in the URL
response = requests.get("https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/name/wter/property/MolecularFormula/txt")
print("Status code:", response.status_code)

if response.status_code == 200:
    print("Success!")
elif response.status_code == 404:
    print("Compound not found.")
elif response.status_code == 503:
    print("Server is temporarily unavailable.")
else:
    print("Unexpected error:", response.status_code)
print()
# example with typo for property in the URL
response = requests.get("https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/name/water/property/MoecularFormula/txt")
print("Status code:", response.status_code)

if response.status_code == 200:
    print("Success!")
elif response.status_code == 404:
    print("Compound not found.")
elif response.status_code == 503:
    print("Server is temporarily unavailable.")
else:
    print("Unexpected error:", response.status_code)
Status code: 200
The molecular formula for water is: H2O


Status code: 404
Compound not found.

Status code: 400
Unexpected error: 400

time – Tracking and Delaying Execution#

The time module helps you pause code or measure how long it takes to run. One particularly useful case that we will be using frequently is pausing requests websites. WorldTimeAPI enforces a fair-use policy, meaning there are rate limits, but they are not explicitly stated as a fixed number per second. The PubChem servers, however, have a strict 5 requests per second limit. If you go over that rate, you can be locked out for making excessive requests.

import time

url ='https://www.timeapi.io/api/Time/current/zone?timeZone=America/Chicago'  # Example for Chicago timezone

for i in range(5):
    response = requests.get(url)
    print("Status code:", response.status_code)
    print(response.text)
    time.sleep(.2)  # wait for 0.2 seconds before the next request

import time
Status code: 200
{"year":2025,"month":8,"day":17,"hour":11,"minute":39,"seconds":24,"milliSeconds":322,"dateTime":"2025-08-17T11:39:24.3227538","date":"08/17/2025","time":"11:39","timeZone":"America/Chicago","dayOfWeek":"Sunday","dstActive":true}
Status code: 200
{"year":2025,"month":8,"day":17,"hour":11,"minute":39,"seconds":54,"milliSeconds":651,"dateTime":"2025-08-17T11:39:54.6512011","date":"08/17/2025","time":"11:39","timeZone":"America/Chicago","dayOfWeek":"Sunday","dstActive":true}
Status code: 200
{"year":2025,"month":8,"day":17,"hour":11,"minute":40,"seconds":25,"milliSeconds":118,"dateTime":"2025-08-17T11:40:25.1182417","date":"08/17/2025","time":"11:40","timeZone":"America/Chicago","dayOfWeek":"Sunday","dstActive":true}
Status code: 200
{"year":2025,"month":8,"day":17,"hour":11,"minute":40,"seconds":55,"milliSeconds":454,"dateTime":"2025-08-17T11:40:55.4545612","date":"08/17/2025","time":"11:40","timeZone":"America/Chicago","dayOfWeek":"Sunday","dstActive":true}
Status code: 200
{"year":2025,"month":8,"day":17,"hour":11,"minute":41,"seconds":25,"milliSeconds":758,"dateTime":"2025-08-17T11:41:25.7589302","date":"08/17/2025","time":"11:41","timeZone":"America/Chicago","dayOfWeek":"Sunday","dstActive":true}

Exploring io and StringIO – Working with In-Memory Text Streams#

The io module provides tools for handling file-like operations in memory. One of the most useful tools is io.StringIO, which allows you to treat a string like a file. Many times when we request data from PubChem we will get data that we need to parse.

In the example below we are getting data from five molecules in one request and it returns all the data in one chunk. By using StringIO we can separate each line of data that is included in the response.

from io import StringIO
cidstr = "2244,3672,1983,5288826,5284371"

url = ('https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/' + cidstr + '/property/Title/TXT')
print(url) # after running this code, you can copy the URL and paste it into your browser to see the output
print()
res = requests.get(url)
file_like = StringIO(res.text)    

for line in file_like:
    print("Parsed line:", line.strip()) # we add .strip() to remove any leading or trailing whitespace characters including newlines
https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/2244,3672,1983,5288826,5284371/property/Title/TXT

Parsed line: Aspirin
Parsed line: Ibuprofen, (+-)-
Parsed line: Acetaminophen
Parsed line: Morphine
Parsed line: Codeine
# Homework In this notebook, you had a crash course in python programming. You assigned values to variables, performed operations on those variables, made decisions on those variables and finally outputted the results to your screen.

Python Problem 1#

You are working with data collected from a small lab inventory and experiment log. The data is stored in various Python data structures. The following code cell defines a series of lists, tuples, sets and dictionaries.

# List of chemical compounds used in an experiment (some are repeated)
chemicals_used = ["ethanol", "water", "acetone", "ethanol", "acetone", "methanol", "dichloromethane"]

# Tuple of lab temperatures in Celsius (immutable data)
temperatures = (22.0, 21.5, 22.3, 21.8)

# Dictionary of molecular weights (g/mol) for a few compounds
molecular_weights = {
    "ethanol": 46.07,
    "water": 18.02,
    "acetone": 58.08,
    "methanol": 32.04
}

# Set of solvents approved for flammable storage
flammable_solvents = {"ethanol", "acetone", "diethyl ether", "methanol"}

1a. Use the set() function to remove duplicates from chemicals_used and store it in a new variable called unique_chemicals.

# Write your code here

1b. Loop through the temperatures tuple and print each reading with the label “Lab temp:”.
Hint: Can you change the values inside a tuple? Why or why not?

# Write your code here

1c. Write code that prints the molecular weight of each compound in unique_chemicals. Note, you will have to create an error check as there are molecules that are unique that are not in the molecular weights dictionary.

# Write your code here

1d. Add “diethyl ether” to the molecular_weights dictionary with a value of 74.12.
Hint: do a web search on how to add to a dictionary as that wasn’t covered explicitly here.

# Write your code here

Python Problem 2#

The following Python script is meant to calculate the total mass of chemicals used in an experiment based on their molecular weights and the number of moles used. However, the code contains multiple errors. Your job is to identify and fix the errors so the code runs correctly.

Identify and fix:

  • Any syntax or indentation errors

  • Any type mismatch that prevents calculation

  • Any dictionary access that fails due to a missing key

  • Any undefined variable issues

Once fixed, your code should print the mass of each compound (if > 2 g) and the total mass.

molecular_weights = {
    "ethanol": "46.07",
    "acetone": "58.08",
    "methanol": "32.04"
}

moles_used = {
    "ethanol": 0.10,
    "acetone": 0.05,
    "chloroform": 0.08
}

total_mass = 0

for compound in moles_used:
mass = molecular_weights[compound] * moles_used[compound]
    if mass > 2.0:
    print(compound, "mass exceeds 2 grams.")
    total_mass += mass

print("Total mass of chemicals:", total_mass)
print("Water was used in this experiment:", "water" in moles_used)

Python Problem 3#

In this exercise, you’ll generate a list of random PubChem compound IDs (CIDs) and generate a URL as shown in the section on StringIO to retrieve and parse the titles for each compound.

3a. Generate a list of random CIDS. As of June 17, 2025 there were 121,440,911 unique compounds (CIDS) in the PubChem database. Using a function create a list that has 10 random CIDs between 1 and 121,440,911. Hint: your random CIDS will be integers, but later we need them as strings. Add them to the list as strings to save some hassel later.

# write your code here

3b. Create a string variable called random_cids that has all 10 random cids joined with commas as separation and no spaces.

# write your code here

3c. Use the requests module and the URL provided in the cell below to request and store the output into a new variable.

url = ('https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/' + random_cids + '/property/SMILES/TXT')
#write your code here

3d. Parse and Print each title in the notebook. If the there is no returned title, print “No defined title for this CID”. Sample output will look like:

Title: 1-(1,4-Dioxan-2-yl)-4-ethylsulfonylbutan-1-amine
No defined title for this CID
No defined title for this CID
Title: Isoxazoles
Title: Cyclohexanol, 4-(4-(4-fluorophenyl)-5-(2-methoxy-4-pyrimidinyl)-1H-imidazol-1-yl)-, trans-

#write your code here

Python Problem 4#

In this exercise, you are given a list of 10 drug names. Sort the list alphabetically and then print out each molecule with its actual index value.

drug_names = [
    "acetaminophen",
    "ibuprofen",
    "lisinopril",
    "atorvastatin",
    "metformin",
    "omeprazole",
    "albuterol",
    "sertraline",
    "amoxicillin",
    "diphenhydramine"
]

# write your code here

Acknowledgements#

This notebook was developed by Ehren Bucholtz (Ehren.Bucholtz@uhsp.edu) and takes inspiration from Chapter 1: Basic Python from Charlie Weiss’s book Scientific Computing for Chemists with Python, license (CC BY-NC-SA 4.0).

This work is licensed under CC BY-NC-SA 4.0.