1.3 Python Intermediate#
Introduction to Python#
This notebook borrows heavily from Chapter 2: Intermediate Python from Charlie Weiss’s book Scientific Computing for Chemists with Python.
Explore Python data structures:
lists
tuples
sets
dictionaries
Explore Python Error Types
NameError
SyntaxError
KeyError and IndexError
TypeError
IndentationError
Explore More Python Modules:
os
random
enumerate
time
Practice code from previous notebook
Python Data Structures#
Python uses several built-in data structures to organize and store information. You were introduced to the list data structure in the last notebook. The four main types are:
Lists
Tuples
Dictionaries
Sets
Lists#
A sequence is an object that stores multiple data items in a contiguous manner. Two types of sequences are strings and lists. Lists are special variables that store multiple values. Each value stored in a list is called an element or item. The following are examples of lists:
digits = [0,1,2,3,4,5,6,7,8,9]
compoundClass = ["alkanes", "alkenes", "alcohols", "ketones", "alkyl halides"]
elementalData = ['hydrogen', 1, 1.008, 'helium', 2, 4.00, 'lithium', 3, 6.94]
Key characteristics of lists:
Ordered: Items are kept in the order you add them.
Mutable: You can change, add or remove items after the list is created.
Heterogenous: It can contain different types of data.
The location of an element or item in a list is its index number. Index numbers begin at 0. A list having 5 elements will have index values from 0 to 4. The syntax for accessing the elements of a list is the bracket operator. The expression inside the brackets specifies the index.
print(compoundClass[3],type(compoundClass[3]))
print()
print(digits[1],type(digits[1]))
print()
for i in elementalData:
print(i, type(i))
ketones <class 'str'>
1 <class 'int'>
hydrogen <class 'str'>
1 <class 'int'>
1.008 <class 'float'>
helium <class 'str'>
2 <class 'int'>
4.0 <class 'float'>
lithium <class 'str'>
3 <class 'int'>
6.94 <class 'float'>
In the last notebook, you used the append() method to add to your lists:
print(len(compoundClass))
compoundClass.append("ethrs")
print(len(compoundClass))
print(compoundClass[-1])
5
6
ethrs
Notice we have a misspelled ethers in the previous example as ethrs. Let’s remove it.
Both pop()
and remove()
are used to delete elements from a list, but behave differently.
Method |
What it does |
Returns a value? |
Deletes by… |
When used? |
---|---|---|---|---|
|
Removes and returns an item |
Yes |
Index |
when you know the index and want the value back |
|
Removes the first matching value |
No |
Value |
when you know the value but not the index |
print(compoundClass[-1])
compoundClass.pop(-1)
ethrs
'ethrs'
# example of removing an item from a list using pop()
solvents = ["methanol", "ethanol", "acetone", "dichloromethane", "ethanol"]
removed_solvent = solvents.pop(2)
print("Removed solvent:", removed_solvent)
print("Updated list:", solvents)
# example of removing an item from a list using remove()
solvents.remove("ethanol")
print("List after removing ethanol:", solvents)
Removed solvent: acetone
Updated list: ['methanol', 'ethanol', 'dichloromethane', 'ethanol']
List after removing ethanol: ['methanol', 'dichloromethane', 'ethanol']
There are many other list functions. Let’s use a few more:
originalList = ["carbon", "oxygen", "hydrogen", "nitrogen", "oxygen", "sulfur", "hydrogen"]
print(originalList)
# create a copy of the original list
elements = originalList.copy()
# index(value) method to find the first occurrence of a value in a list
pos = elements.index("oxygen")
print("First occurrence of 'oxygen' is at index:", pos)
# Change the value at a specific index
elements[pos]= "fluorine"
print("Elements after changing 'oxygen' to 'fluorine':", elements)
# using the count() method to count occurrences of a value in a list
count_oxygen = elements.count("hydrogen")
print("Number of occurrences of 'hydrogen':", count_oxygen)
# using the sort() method to sort a list in ascending order
elements.sort()
print("Sorted elements:", elements)
# using the reverse() method to reverse the order of a list
elements.reverse()
print("Reversed elements:", elements)
# using the clear() method to remove all items from a list
elements.clear()
print("Elements after clear:", elements)
print("Original list remains unchanged:", originalList)
['carbon', 'oxygen', 'hydrogen', 'nitrogen', 'oxygen', 'sulfur', 'hydrogen']
First occurrence of 'oxygen' is at index: 1
Elements after changing 'oxygen' to 'fluorine': ['carbon', 'fluorine', 'hydrogen', 'nitrogen', 'oxygen', 'sulfur', 'hydrogen']
Number of occurrences of 'hydrogen': 2
Sorted elements: ['carbon', 'fluorine', 'hydrogen', 'hydrogen', 'nitrogen', 'oxygen', 'sulfur']
Reversed elements: ['sulfur', 'oxygen', 'nitrogen', 'hydrogen', 'hydrogen', 'fluorine', 'carbon']
Elements after clear: []
Original list remains unchanged: ['carbon', 'oxygen', 'hydrogen', 'nitrogen', 'oxygen', 'sulfur', 'hydrogen']
The following code cell has a list of drug names. One of the drugs is misspelled.
Correct the misspelling using a list function.
Sort the list alphabetically
remove the duplicates
print the updated list
drugs = ["aspirin", "ibuprophen", "acetaminophen", "naproxen", "diclofenac", "acetaminophen"]
# write your code here
Solution
pos = drugs.index("ibuprophen")
drugs[pos] = "ibuprofen"
drugs.sort()
drugs.remove("acetaminophen")
print(drugs)
Tuples#
A tuple is an ordered, immutable collection of items. They are used when you want to group related values that should not change. Think of it as a locked list that you don’t want to accidentally modify or that it is reference data.
Key characteristics of tuples:
Ordered: Items are kept in the order you add them.
Immutable: Values cannot be changed.
Indexed: You can reference specific values.
Heterogenous: It can contain different types of data.
To create an empty tuple variable:
myTuple =()
molecules = [
("benzene", 78.11),
("toluene", 92.14),
("phenol", 94.11)
]
# Print molecular weights
for name, mw in molecules:
print(f"{name}: {mw} g/mol")
if mw > 90:
print(f"{name} has a molecular weight greater than 90 g/mol")
else:
print(f"{name} has a molecular weight less than or equal to 90 g/mol")
benzene: 78.11 g/mol
benzene has a molecular weight less than or equal to 90 g/mol
toluene: 92.14 g/mol
toluene has a molecular weight greater than 90 g/mol
phenol: 94.11 g/mol
phenol has a molecular weight greater than 90 g/mol
Why might a tuple be a better choice than a list for storing (name, weight) pairs in the above case?
Solution
Tuples are ideal here because:
The molecule name and molecular weight are tightly linked.
You don’t want these values to change accidentally (immutability protects them).
Sets#
A set is a multi-element data structure, similar to a list, with one key difference: each element can appear only once. Sets are particularly useful when the goal is to keep track of which items are present, rather than how many. For instance, when taking inventory of a chemical stockroom, you might use a set to record the names of all compounds available for experiments. Even if multiple bottles of the same compound exist, the set will store that compound’s name only once (because the focus is on presence, not quantity). In Python, a set looks similar to a list, but it uses curly braces {} instead of square brackets [].
molecules = ["acetone", "ethanol", "acetone", "methanol", "ethanol"]
unique_molecules = set(molecules)
print(unique_molecules)
{'acetone', 'methanol', 'ethanol'}
Some common set methods include:
Method |
Description |
---|---|
.add(item) |
add a new element to the set |
.remove(item) |
Removes a specific element from the set |
.discard(item) |
Removes an item if it exists(no error if missing) |
.union(set2) |
combines elements of both sets |
.intersection(set2) |
Get only items in both sets |
.difference(set2) |
Get elements in set 1 but not in set 2 |
Insted of using set methods, you can also use operators.
# Two different screenings of drugs
results1 = {"aspirin", "ibuprofen", "naproxen"} # drug screening 1 results as a set
results2 = {"naproxen", "acetaminophen"} # drug screening 2 results as a set
# Find common drugs in both groups
common_drugs = results1.intersection(results2)
print("Common drugs in both groups:", common_drugs)
# Find unique drugs in group1
unique_group1 = results1.difference(results2)
print("Unique drugs in group1:", unique_group1)
# Find unique drugs in group2
unique_group2 = results2.difference(results1)
print("Unique drugs in group2:", unique_group2)
set3 = results1.union(results2)
print("All unique drugs from both groups:", set3)
set3.add("diclofenac")
print("All unique drugs after adding diclofenac:", set3)
moleculesList= list(set3)
print("Molecules as a List:", moleculesList)
moleculesList.sort()
print("Sorted Molecules List:", moleculesList)
Common drugs in both groups: {'naproxen'}
Unique drugs in group1: {'aspirin', 'ibuprofen'}
Unique drugs in group2: {'acetaminophen'}
All unique drugs from both groups: {'naproxen', 'acetaminophen', 'ibuprofen', 'aspirin'}
All unique drugs after adding diclofenac: {'naproxen', 'diclofenac', 'acetaminophen', 'ibuprofen', 'aspirin'}
Molecules as a List: ['naproxen', 'diclofenac', 'acetaminophen', 'ibuprofen', 'aspirin']
Sorted Molecules List: ['acetaminophen', 'aspirin', 'diclofenac', 'ibuprofen', 'naproxen']
Dictionaries#
Python dictionaries are a type of multi-element object that store data as key–value pairs, much like how a real dictionary links a word (the key) to its definition (the value). Also known as associative arrays, dictionaries allow users to access values directly using keys without needing to know or rely on the order of the items. One way to think of a dictionary is as a container of named variables, each assigned a specific value.
For example, if you wanted to write a script to calculate the molecular weight of a compound based on its molecular formula, you would need to look up the atomic mass of each element using its chemical symbol. In this case, the element symbol would serve as the key, and the atomic mass as the value.
Similar to a set, a dictionary is written using curly braces {}, however entry is a key:value
pair separated by a colon.
Dictionaries do not allow duplicate keys; if a key is repeated, the last assigned value will overwrite the previous one. This makes dictionaries especially useful for storing and quickly retrieving related data in a chemistry context.
# Create a dictionary of common compounds and their molecular weights
molecular_weights = {
"water": 18.02,
"ethanol": 46.07,
"acetone": 58.08,
"ethanol": 50.00 # Oops! ethanol is repeated
}
# Print the dictionary
print("Molecular Weights:")
print(molecular_weights)
Molecular Weights:
{'water': 18.02, 'ethanol': 50.0, 'acetone': 58.08}
Python Error Types#
In Python, errors are raised when your code breaks the rules of the language. Understanding common error types can help you debug faster and write better code. Below are some of the most frequent errors students encounter.
Note that each of the following will show you the error and stop the notebook from running subsequent code in the cell.
NameError
: Using a variable that hasn’t been defined#
A NameError occurs
when the code tries to use a variable or function name that hasn’t been defined. This is often caused by a typo in the name, but it can also happen if you’re working in a Jupyter notebook and try to run code before running the cells that define necessary variables.
Often if you have saved your notebook and reopened it to resume where you left off, it’s a good idea to use Run → Run All Cells from the top menu to make sure all required code has been executed.
# Example: misspelled variable name
mol_weight = 46.07
print(mol_wieght) # Typo!
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[11], line 3
1 # Example: misspelled variable name
2 mol_weight = 46.07
----> 3 print(mol_wieght) # Typo!
NameError: name 'mol_wieght' is not defined
SyntaxError
: Your code breaks Python’s grammar rules#
A programming language’s syntax is its set of rules for how code must be written—including proper formatting, use of symbols, valid variable names, and more. A SyntaxError
occurs when your code violates one of these rules. To help you debug, Python provides an error message that shows the specific line of code with the problem and often includes a pointer (^
) to indicate where the issue is likely occurring. This error usually signals a small but important mistake, such as a missing colon, unmatched parentheses, or incorrect indentation.
compoundClass = ["alkanes", "alkenes", "alcohols", "ketones", "alkyl halides"]
for item in compoundClass
print(item)
Cell In[12], line 2
for item in compoundClass
^
SyntaxError: expected ':'
KeyError
: Accessing a missing key in a dictionary#
You will get a KeyError
you try to access a information in a diction that doesn’t exist.
molecular_weights = {
"water": 18.02,
"ethanol": 46.07,
"acetone": 58.08,
"methanol": 32.04
}
print(molecular_weights[dichloromethane]) # KeyError: 'dichloromethane' not in dictionary
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[13], line 8
1 molecular_weights = {
2 "water": 18.02,
3 "ethanol": 46.07,
4 "acetone": 58.08,
5 "methanol": 32.04
6 }
----> 8 print(molecular_weights[dichloromethane]) # KeyError: 'dichloromethane' not in dictionary
NameError: name 'dichloromethane' is not defined
Tip: Use .get() with a default message to avoid crashing, or an if/else
statement
molecular_weights = {
"water": 18.02,
"ethanol": 46.07,
"acetone": 58.08,
"methanol": 32.04
}
print(molecular_weights.get("dichlormethane", "Key Not found"))
solvents = ["methanol", "ethanol", "water", "dichloromethane", "acetone"]
for solvent in solvents:
if solvent in molecular_weights:
print(f"{solvent} has a molecular weight of {molecular_weights[solvent]} g/mol")
else:
print(f"{solvent} is not in the molecular weights dictionary")
Key Not found
methanol has a molecular weight of 32.04 g/mol
ethanol has a molecular weight of 46.07 g/mol
water has a molecular weight of 18.02 g/mol
dichloromethane is not in the molecular weights dictionary
acetone has a molecular weight of 58.08 g/mol
Notice that the code above also shows you can access the dictionary by key and the order you access them does not matter.
IndexError
: Accessing a list index that doesn’t exist#
If you try to access an index value that is greater than the number of items in the list, you will get an IndexError
.
elements = ["H", "C", "O"]
print(elements[5])
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
Cell In[15], line 2
1 elements = ["H", "C", "O"]
----> 2 print(elements[5])
IndexError: list index out of range
TypeError: Performing an invalid operation for the data type
A TypeError
occurs when using the wrong object type for a particular function or application. For example, trying to do a mathematical function on a string. You may see this when you import data from PubChem or other sources. That data is often brought in a as text string.
mass1 = 18.02
mass2 = "18.02"
print(mass1 + mass2)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[16], line 3
1 mass1 = 18.02
2 mass2 = "18.02"
----> 3 print(mass1 + mass2)
TypeError: unsupported operand type(s) for +: 'float' and 'str'
Indentation Error: Incorrect indentation (spacing) in the code#
Python uses indentation to define code blocks (for example in making for
loops or if
statments). If it’s inconsistent or missing, an error is raised.
solvents = ["methanol", "ethanol", "water", "dichloromethane", "acetone"]
for solvent in solvents:
print(solvent, type(solvent))
Cell In[17], line 4
print(solvent, type(solvent))
^
IndentationError: expected an indented block after 'for' statement on line 3
Fix each of the code cells above so that they don’t throw an error.
More Python Modules#
Python includes many powerful built-in modules that make it easier to perform everyday tasks. Let’s explore some commonly used tools with examples from a chemistry or computational workflow.
os
: Interacting with the Operating System#
The os
module allows you to work with the file system—listing files, navigating folders, and managing paths. You might use os
to scan or write to folders for chemical data files, experimental logs, or spectral images.
A useful function from the os module is the listdir()
method which lists all the files and directories in a folder.
To open a file not in the directory of your Jupyter notebook, you will need to change the directory Python is currently looking in, known as the current working directory, using the chdir()
method. It takes a single string argument of the path in string format to the folder containing the files of interest. The exact format will vary depending upon your computer and if you are using macOS, Windows, or Linux.
If you are not sure which directory is the current working directory, you can use the getcwd()
function. It does not require any arguments.
If you are creating files and want to save them into a specific relative location folder, you can check if the folder exists using os.path.exists(folder_name)
and os.makedirs(folder_name)
to create the folder.
import os
# Example: list all files in the current directory
files = os.listdir()
print("Files in this folder:", files)
# Example: get the current working directory
cwd = os.getcwd()
print("Current working directory:", cwd)
# Example: change the current working directory
# os.chdir('/path/to/directory') # Uncomment and set the path to change directory
# Example: Check if directory exists and create a new directory
#folder_name = "new_directory"
#if not os.path.exists(folder_name):
# os.makedirs(folder_name) # Safe recursive folder creation
# print(f"Created directory: {folder_name}")
Files in this folder: ['01-2-python-basics.ipynb', '.ipynb_checkpoints', '01-3-python-intermediate.ipynb:Zone.Identifier', '01-1-OLCC-Primer.ipynb', '01-3-python-intermediate.ipynb', 'README.md:Zone.Identifier', 'README.md', '01-1-OLCC-Primer.ipynb:Zone.Identifier', '01-2-python-basics.ipynb:Zone.Identifier']
Current working directory: /home/rebelford/jupyterbooks/datachem2025book/content/modules/01-OLCC-Primer
random
– Generating Random Numbers or Choices#
The random module provides a selection of functions for generating random values. Random values can be integers or floats and can be generated from a variety of ranges and distributions. Note: Brackets mean inclusive while the parentheses mean exclusive. [0,1) will provide values of 0 up to but excluding 1 and is used for probability simulations. [1,10] will provide values of 1 up to and including 10.
Function |
Description |
---|---|
random.random() |
Generates a value from [0,1) |
random.randint(x,y) |
Generates a random integer in the specified range inclusive of both endpoints |
random.randrange(x,y,z) |
Generates an integer from the provided range [x, y) with an optional step to define intervals |
random.uniform(x,y) |
Generates a float from the range [x, y) with a uniform probability |
random.choice() |
Randomly selects an item from a list, tuple or other multi-element object |
import random
print("generate a random float between 0 and 1")
random_float = random.random()
print("Random float:", random_float)
print()
print("generate a random integer between 1 and 10 including 1 and 10")
for i in range(5):
random_number = random.randint(1, 10)
print("Random number:", random_number)
print()
print("generate a random integer between 1 and 10 selecting only even numbers")
for i in range(5):
random_number = random.randrange(0, 10, 2)
print("Random number:", random_number)
print()
print("randomly select a compound from a list")
compounds = ["ethanol", "acetone", "toluene", "chloroform"]
print(compounds)
selected = random.choice(compounds)
print("Randomly selected compound:", selected)
generate a random float between 0 and 1
Random float: 0.8864023976443729
generate a random integer between 1 and 10 including 1 and 10
Random number: 7
Random number: 8
Random number: 8
Random number: 3
Random number: 3
generate a random integer between 1 and 10 selecting only even numbers
Random number: 6
Random number: 6
Random number: 2
Random number: 4
Random number: 0
randomly select a compound from a list
['ethanol', 'acetone', 'toluene', 'chloroform']
Randomly selected compound: toluene
Controlling Randomness with random.seed()
#
Randomness in Python allows programs to simulate unpredictability.This is accomplished with a pseudorandom number generator, which means that the sequence of random numbers appear random, but is actually an algorithm for generating a sequence of numbers that approximate sequences of random numbers. The sequence is not truly random because it is completely determined by an initial value called the seed. By using random.seed()
, you can set the starting point of this sequence, ensuring that your random operations (like sampling or simulation) produce the same results every time. This makes your code reproducible and easier to debug or share with others.
import random
compounds = ["ethanol", "acetone", "toluene", "methanol", "chloroform","dichloromethane", "benzene", "hexane", "cyclohexane"]
print("Without setting a seed we should get different results:")
print("First run:", random.choice(compounds))
print("Second run:", random.choice(compounds))
print("Third run:", random.choice(compounds))
print()
# Setting a seed for reproducibility
print("With setting a seed we should get the same results:")
random.seed(42) # Set the seed value
print("First run:", random.choice(compounds))
random.seed(42) # Reset the seed to get the same result
print("Second run:", random.choice(compounds))
random.seed(42) # Reset the seed to get the same result
print("Third run:", random.choice(compounds))
random.seed(123) # Set a new random seed
print("Fourth run with new random seed:", random.choice(compounds))
Without setting a seed we should get different results:
First run: hexane
Second run: toluene
Third run: acetone
With setting a seed we should get the same results:
First run: acetone
Second run: acetone
Third run: acetone
Fourth run with new random seed: ethanol
enumerate()
– Looping with Indexes#
The enumerate()
function lets you loop through a list while keeping track of each item’s index. This is especially useful when you want to number items, label experimental runs, or access both position and content during a loop.
the basic syntax is:
for index, item in enumerate(iterable, start=0):
where:
# index is the position
# item is the actual element
iterable
is the list or other object you are looping overstart
is optional, but by default is 0, but you can change it
# Example: print index and name of each solvent
solvents = ["ethanol", "acetone", "toluene", "methanol", "chloroform","dichloromethane", "benzene", "hexane", "cyclohexane"]
for index, solvent in enumerate(solvents):
print(f"{index + 1}. {solvent}") # add one to index for human-readable numbering
1. ethanol
2. acetone
3. toluene
4. methanol
5. chloroform
6. dichloromethane
7. benzene
8. hexane
9. cyclohexane
Getting Data from the Web with requests
#
The requests module allows Python to send HTTP requests to websites and APIs and get back data—often in text or other formats that we will explore. This is essential for working with chemical databases, pulling compound info, literature, or real-time data from public sources.
Common requests
Methods
Method |
Purpose |
---|---|
|
Retrieve data from a website |
|
Submit data to a website (e.g. forms, uploads) |
|
Check if the request was successful (200 = OK) |
|
Response content as a string |
|
Parses JSON responses into Python dictionaries |
Some common returned status_code
values
Code |
Meaning |
Description |
---|---|---|
200 |
OK |
The request was successful and the data was returned |
201 |
Created |
A new resource was successfully created (e.g., via |
400 |
Bad Request |
The request was malformed (maybe a typo) |
404 |
Not Found |
The resource (e.g., page or data) doesn’t exist |
429 |
Too many requests |
You’ve hit the rate limit |
500 |
Internal server error |
The server had an error while processing the request |
import requests
response = requests.get("https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/name/water/property/MolecularFormula/txt")
print("Status code:", response.status_code)
if response.status_code == 200:
print(f"The molecular formula for water is:", response.text)
else:
print("Request failed:", response.status_code)
print()
# example with typo for water in the URL
response = requests.get("https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/name/wter/property/MolecularFormula/txt")
print("Status code:", response.status_code)
if response.status_code == 200:
print("Success!")
elif response.status_code == 404:
print("Compound not found.")
elif response.status_code == 503:
print("Server is temporarily unavailable.")
else:
print("Unexpected error:", response.status_code)
print()
# example with typo for property in the URL
response = requests.get("https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/name/water/property/MoecularFormula/txt")
print("Status code:", response.status_code)
if response.status_code == 200:
print("Success!")
elif response.status_code == 404:
print("Compound not found.")
elif response.status_code == 503:
print("Server is temporarily unavailable.")
else:
print("Unexpected error:", response.status_code)
Status code: 200
The molecular formula for water is: H2O
Status code: 404
Compound not found.
Status code: 400
Unexpected error: 400
time
– Tracking and Delaying Execution#
The time
module helps you pause code or measure how long it takes to run. One particularly useful case that we will be using frequently is pausing requests websites. WorldTimeAPI enforces a fair-use policy, meaning there are rate limits, but they are not explicitly stated as a fixed number per second. The PubChem servers, however, have a strict 5 requests per second limit. If you go over that rate, you can be locked out for making excessive requests.
import time
url ='https://www.timeapi.io/api/Time/current/zone?timeZone=America/Chicago' # Example for Chicago timezone
for i in range(5):
response = requests.get(url)
print("Status code:", response.status_code)
print(response.text)
time.sleep(.2) # wait for 0.2 seconds before the next request
import time
Status code: 200
{"year":2025,"month":8,"day":17,"hour":11,"minute":39,"seconds":24,"milliSeconds":322,"dateTime":"2025-08-17T11:39:24.3227538","date":"08/17/2025","time":"11:39","timeZone":"America/Chicago","dayOfWeek":"Sunday","dstActive":true}
Status code: 200
{"year":2025,"month":8,"day":17,"hour":11,"minute":39,"seconds":54,"milliSeconds":651,"dateTime":"2025-08-17T11:39:54.6512011","date":"08/17/2025","time":"11:39","timeZone":"America/Chicago","dayOfWeek":"Sunday","dstActive":true}
Status code: 200
{"year":2025,"month":8,"day":17,"hour":11,"minute":40,"seconds":25,"milliSeconds":118,"dateTime":"2025-08-17T11:40:25.1182417","date":"08/17/2025","time":"11:40","timeZone":"America/Chicago","dayOfWeek":"Sunday","dstActive":true}
Status code: 200
{"year":2025,"month":8,"day":17,"hour":11,"minute":40,"seconds":55,"milliSeconds":454,"dateTime":"2025-08-17T11:40:55.4545612","date":"08/17/2025","time":"11:40","timeZone":"America/Chicago","dayOfWeek":"Sunday","dstActive":true}
Status code: 200
{"year":2025,"month":8,"day":17,"hour":11,"minute":41,"seconds":25,"milliSeconds":758,"dateTime":"2025-08-17T11:41:25.7589302","date":"08/17/2025","time":"11:41","timeZone":"America/Chicago","dayOfWeek":"Sunday","dstActive":true}
Exploring io
and StringIO
– Working with In-Memory Text Streams#
The io
module provides tools for handling file-like operations in memory. One of the most useful tools is io.StringIO
, which allows you to treat a string like a file. Many times when we request data from PubChem we will get data that we need to parse.
In the example below we are getting data from five molecules in one request and it returns all the data in one chunk. By using StringIO
we can separate each line of data that is included in the response.
from io import StringIO
cidstr = "2244,3672,1983,5288826,5284371"
url = ('https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/' + cidstr + '/property/Title/TXT')
print(url) # after running this code, you can copy the URL and paste it into your browser to see the output
print()
res = requests.get(url)
file_like = StringIO(res.text)
for line in file_like:
print("Parsed line:", line.strip()) # we add .strip() to remove any leading or trailing whitespace characters including newlines
https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/2244,3672,1983,5288826,5284371/property/Title/TXT
Parsed line: Aspirin
Parsed line: Ibuprofen, (+-)-
Parsed line: Acetaminophen
Parsed line: Morphine
Parsed line: Codeine
Python Problem 1#
You are working with data collected from a small lab inventory and experiment log. The data is stored in various Python data structures. The following code cell defines a series of lists, tuples, sets and dictionaries.
# List of chemical compounds used in an experiment (some are repeated)
chemicals_used = ["ethanol", "water", "acetone", "ethanol", "acetone", "methanol", "dichloromethane"]
# Tuple of lab temperatures in Celsius (immutable data)
temperatures = (22.0, 21.5, 22.3, 21.8)
# Dictionary of molecular weights (g/mol) for a few compounds
molecular_weights = {
"ethanol": 46.07,
"water": 18.02,
"acetone": 58.08,
"methanol": 32.04
}
# Set of solvents approved for flammable storage
flammable_solvents = {"ethanol", "acetone", "diethyl ether", "methanol"}
1a. Use the set()
function to remove duplicates from chemicals_used
and store it in a new variable called unique_chemicals
.
# Write your code here
1b. Loop through the temperatures tuple and print each reading with the label “Lab temp:”.
Hint: Can you change the values inside a tuple? Why or why not?
# Write your code here
1c. Write code that prints the molecular weight of each compound in unique_chemicals
. Note, you will have to create an error check as there are molecules that are unique that are not in the molecular weights dictionary.
# Write your code here
1d. Add “diethyl ether” to the molecular_weights dictionary with a value of 74.12.
Hint: do a web search on how to add to a dictionary as that wasn’t covered explicitly here.
# Write your code here
Python Problem 2#
The following Python script is meant to calculate the total mass of chemicals used in an experiment based on their molecular weights and the number of moles used. However, the code contains multiple errors. Your job is to identify and fix the errors so the code runs correctly.
Identify and fix:
Any syntax or indentation errors
Any type mismatch that prevents calculation
Any dictionary access that fails due to a missing key
Any undefined variable issues
Once fixed, your code should print the mass of each compound (if > 2 g) and the total mass.
molecular_weights = {
"ethanol": "46.07",
"acetone": "58.08",
"methanol": "32.04"
}
moles_used = {
"ethanol": 0.10,
"acetone": 0.05,
"chloroform": 0.08
}
total_mass = 0
for compound in moles_used:
mass = molecular_weights[compound] * moles_used[compound]
if mass > 2.0:
print(compound, "mass exceeds 2 grams.")
total_mass += mass
print("Total mass of chemicals:", total_mass)
print("Water was used in this experiment:", "water" in moles_used)
Python Problem 3#
In this exercise, you’ll generate a list of random PubChem compound IDs (CIDs) and generate a URL as shown in the section on StringIO
to retrieve and parse the titles for each compound.
3a. Generate a list of random CIDS. As of June 17, 2025 there were 121,440,911 unique compounds (CIDS) in the PubChem database. Using a function create a list that has 10 random CIDs between 1 and 121,440,911. Hint: your random CIDS will be integers, but later we need them as strings. Add them to the list as strings to save some hassel later.
# write your code here
3b. Create a string variable called random_cids that has all 10 random cids joined with commas as separation and no spaces.
# write your code here
3c. Use the requests module and the URL provided in the cell below to request and store the output into a new variable.
url = ('https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/' + random_cids + '/property/SMILES/TXT')
#write your code here
3d. Parse and Print each title in the notebook. If the there is no returned title, print “No defined title for this CID”. Sample output will look like:
Title: 1-(1,4-Dioxan-2-yl)-4-ethylsulfonylbutan-1-amine
No defined title for this CID
No defined title for this CID
Title: Isoxazoles
Title: Cyclohexanol, 4-(4-(4-fluorophenyl)-5-(2-methoxy-4-pyrimidinyl)-1H-imidazol-1-yl)-, trans-
#write your code here
Python Problem 4#
In this exercise, you are given a list of 10 drug names. Sort the list alphabetically and then print out each molecule with its actual index value.
drug_names = [
"acetaminophen",
"ibuprofen",
"lisinopril",
"atorvastatin",
"metformin",
"omeprazole",
"albuterol",
"sertraline",
"amoxicillin",
"diphenhydramine"
]
# write your code here
Acknowledgements#
This notebook was developed by Ehren Bucholtz (Ehren.Bucholtz@uhsp.edu) and takes inspiration from Chapter 1: Basic Python from Charlie Weiss’s book Scientific Computing for Chemists with Python, license (CC BY-NC-SA 4.0).
This work is licensed under CC BY-NC-SA 4.0.