1.2 Python Basics#
Introduction to Python#
This notebook is a modification of the 00_python_basics jupyter notebook from the MolSSI Cheminformatics Workshop. MolSSI is the Molecular Sciences Software Institute.
This notebook also takes inspiration from Chapter1: Basic Python from Charlie Weiss’s book Scientific Computing for Chemists with Python.
Overview
Questions:
What is the Python programming language, and what is it used for?
What is the basic syntax of the Python programming language?
How do we store information to use in python?
How do we create repeatable actions?
How can we decided to take one action or another?
Objectives:
Assign values to variables and lists.
Define values by
type
.Perform operations on values of different
type
.Use the
print
function to check how the code is working.Use a
for
loop to perform the same action on all the items in a list.Use the append function to create new lists in
for
loops.Make decisions using
if
statements.
What is Python and why use it?#
All of the software you use on a regular basis is created through the use of programming languages. Programming languages allow us to write instructions to a computer. There are many different programming languages, each with their own strengths, weaknesses, and uses. Some popular programming languages you might hear about are Javascript (used on the web - any website with interactive content likely uses javascript), Python (scientific programming and many other applications), C++ (high performance applications - much of computational chemistry, self-driving cars, etc), SQL (databases), and many more.
Python is a computer programming language that has become ubiquitous in scientific programming. The Python programming language was first introduced in the year 1991, and has grown to be one of the most popular programming languages for both scientists and non-scientists. According to the 2024 Stack Overflow Developer Survey, Python is the fourth most popular programming language. Compared to other programming languages, Python is considered more intutitive to start learning and is also extremely versatile. Python can be used to build web applications, interact with databases, and calculate and analyze data. Python also has many libraries focused on science and machine learning.
In this cheminformatics course, we will see that we can use Python to run and analyze our simulations. We can also use some of Python’s many libraries to fit data and predict properties.
Getting Started#
Our initial lessons will run python interactively through a Python interpreter. We will use an environment called a Jupyter notebook. The Jupyter notebook is an environment in your browser that can be used to write an execute Python code interactively. You can view a Jupyter notebook using your browser or in some specialized text editors.
Jupyter notebooks are made up of cells. Cells can either be markdown or code cells.
This cell is a Markdown Cell.
Code cells have executable Python code in them.
To change the type of a cell you can use the drop down option in the top window.
To run code in a cell, click inside of the cell and press Shift+Enter
.
If the code executes successfully, the next cell will become the active cell.
Assigning variables#
Any Python interpreter can work just like a calculator, although this is not very useful. THe followeing cell is a code cell
denoted by the gray
box with the brackets [ ] in front of it.
When the brackets are empty [ ], the code in that cell has not been run.
When the brackets have an asterix [ * ], the code in that cell is running or waiting to run.
When the brackets have a number [1], the code has been run and the number represents the nth code cell that has been run in a session.
Press the Shift Key and the Enter Key (Shift + Enter
) at the same time to run (also called “execute”) the code in a cell.
The following cell contains an expression to calculate 3 + 7
3 + 7
10
Here, Python has performed a calculation for us. Python can use many types of mathematical operators:
Operator |
Name |
---|---|
+ |
Addition |
- |
Subtraction |
* |
Multiplication |
/ |
Division |
% |
Modulus(remainder) |
** |
Exponentiation |
Using python as a calculator isn’t that useful. To save this value, or other values, we assign them to a variable for later use. Variable assignment is the technical term for doing this. If we do not assign an expression to a variable, we will not be able to use its value later.
The syntax for assigning variables is the following:
variable_name = varaible_value
Let’s see this in action with a calculation. Let’s define some variables for our calculation.
# this is a comment line denoted by an octothorpe (yes, that is the official name of that symbol!)
# Let's calculate Gibbs energy of a reaction
deltaH = -541.5 #kJ/mole
deltaS = 10.4 #kJ/(mole K)
temp = 298 #Kelvin
deltaG = deltaH - temp * deltaS
Notice several things about this code.
Any time a #
is included, the python interpreter ignores the characters after the #
. The computer does not do anything with these comments.
They have been used here to remind the user what units each of their values are in.
Comments are also often used to explain what the code is doing or leave information for future people who might use the code.
When choosing variable names, you should choose informative names so that someone reading your code can tell what they represent. Naming a variable temp or temperature is much more informative than naming that variable t.
We can now access any of the variables from other cells.
Press the Shift Key and the Enter Key (Shift + Enter
) at the same time to run (also called “execute”) the code in the following cell.
deltaH
-541.5
Change the above cell to give you values stored in other variables and even the value of your calculated Gibbs Energy.
The output above works great if you want to see the value of any given variable. It will not work if you want to see multiple values. Run the following code cell to see what happens.
deltaH
deltaS
temp
deltaG
-3640.7000000000003
Using the print()
Function#
Notice that the above code cell only outputs that last variable. This is a quirk of using the jupyter notebook. To display each variable in the code, you must use a python function to display the output.
Functions are reusable pieces of code that perform specific tasks. Some are built in and others are user defined.
The print()
function is one of the most commonly used functions that tells Python to display some text or values. While Jupyter notebooks will display the output or contents of a variable by default, the print()
function allows for considerably more control.
print(deltaG)
print("The value of delta G is",deltaG,"kJ/mol")
-3640.7000000000003
The value of delta G is -3640.7000000000003 kJ/mol
There are many types of functions beyond printing. Functions can open files, perform calculations, and many other operations. Functions have a name that is followed by parenthesis containing the function inputs separated by commas (also called arguments).
function_name(argument1, argument2)
In the previous code block, we introduced the print
function. Often, we will use the print function just to make sure our code is working correctly.
Note that if you do not specify a new name for a variable, then it doesn’t automatically change the value of the variable. For example, let’s change our value of deltaG from kJ to joules:
# run this code cell
print(deltaG)
deltaG * 1000
print(deltaG)
print(deltaG*1000)
print(deltaG)
deltaG = deltaG * 1000
print(deltaG)
-3640.7000000000003
-3640.7000000000003
-3640700.0000000005
-3640.7000000000003
-3640700.0000000005
Next to each line of code above, add comments to explain what each line of code is doing.
What is the difference between these two code segments?
Code Segment 1 |
Code Segment 2 |
---|---|
|
|
|
There are situations where it is reasonable to overwrite a variable with a new value, but you should always think carefully about this. Usually it is a better practice to give the variable a new name and leave the existing variable as is.
deltaH_joules = deltaH * 1000 #recall we stored value for deltaH above in a code cell
print(deltaH)
print(deltaH_joules)
-541.5
-541500.0
The Ideal Gas Law is expressed as:
PV=nRT
First report your output as atm and then again in mmHg. Note: 1 atm pressure is 760 mmHg
The pressure in the container is (some calculated value) atm.
The pressure in the container is (some calculated value) mmHg.
# write and run your code here
Data Types#
Each variable is some particular type of data.
The most common types of data are strings (str
),
integers (int
), and floating point numbers (float
).
You can identify the data type of any variable with the function type(variable_name)
.
type(deltaG)
float
You can change the data type of a variable like this. This is called casting.
deltaG_string = str(deltaG)
type(deltaG_string)
str
We could have created a variable as a string originally by surrounding the value in quotes " "
. Note that in Pytyhon it doesn’t matter if you use single '
or double quotes"
, the first quote just has to match the closing quote.
string_variable = "This is a string" # note using double quotes
string_variable = 'This is a string' # note using single quotes
type(string_variable)
str
string_variable = "This is a string'
Cell In[12], line 1
string_variable = "This is a string'
^
SyntaxError: unterminated string literal (detected at line 1)
This is an example of an error message. An error message is what occurs if there is something wrong with your code. In Python, when you read error messagees, you should try to read the last line of the error message first. It will have a message about what went wrong in the program execution.
This code doesn’t work because we used two different types of quotes. First a double quote and then a single.
You can also nest one function in another function. For example you can print the type of a variable.
myInteger = 1
myFloat = 1.0
myString = "my string" # note using double quotes
myString2= 'another string' # note using single quotes
print(type(myInteger))
print(type(myFloat))
print(type(myString))
print(type(myString2))
print(myString2)
<class 'int'>
<class 'float'>
<class 'str'>
<class 'str'>
another string
Note that the output of the above gets into the idea of python being an object oriented programming language. An object in python is a container that holds data (like numbers or text) and behavior (functions or methods that act on that data). A class defines what kind of data and actions an object should have. Think of the class as the rules for creating objects. An instance is a specific object created from the class. In the above, myString
is an instance of the str
class. mystring2
is a separate instance in the str
class. Each instance holds their own values, but are created with the rules of the str
class.
When we tell python to print(type(myString2))
it is telling us that myString2
is a representative(or instance) of the str
class but not what the actual value of the instance is.
The print(type(myString2))
output tells us myString2
is a str
type of object, while print(myString2)
gives us the actual value stored in the variable “another string”.
Strings#
As you saw above, you can create a string variable by enclosing text in single or double quotes. Run the following code cell:
firstNumber = 3
secondNumber = 4
thirdNumber ='3'
fourthNumber ='4'
print(firstNumber+secondNumber)
print(thirdNumber+fourthNumber)
7
34
What happens when you add two integers?
What happens when you add two strings?
Predict the following:
What do you think will happen if you add a float to an integer?
What do you think will happen if you add an interger to a string?
first = 3.0
print(firstNumber+secondNumber)
print(thirdNumber+secondNumber)
7
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[15], line 3
1 first = 3.0
2 print(firstNumber+secondNumber)
----> 3 print(thirdNumber+secondNumber)
TypeError: can only concatenate str (not "int") to str
As shown above, adding two numbers (whether float or integer) is seen as a mathematical operation in python. However, if you have two strings, python concatenates them. This is a way of combining or lengthening strings, but no actual math is performed.
In cheminformatics we will will often import numerical data from a text document. This will result in an error. The remedy is to convert the string(s) into numbers using either the float()
or int()
functions.
thirdNumber = float(thirdNumber)
fourthNumber = int(fourthNumber)
fifthNumber = thirdNumber + fourthNumber
print(fifthNumber)
7.0
We can also convert back to string using the str() function:
thirdNumber = str(thirdNumber)
fourthNumber = str(fourthNumber)
print(thirdNumber)
print(fourthNumber)
sixthNumber = thirdNumber + fourthNumber
print(sixthNumber)
3.0
4
3.04
The len()
function is another useful built-in function in Python. It returns the number of items in an object. It can be useful to tell you how many characters are found in a string, including spaces and special characters.
word1 = "PubChem"
word2 = "is"
word3 = "a"
word4 = "chemical"
word5 = "database"
word6 ="."
sentence = word1+word2+word3+word4+word5+word6
print(len(word1))
print(len(word2))
print(len(word3))
print(len(word4))
print(len(word5))
print(len(word6))
print(len(sentence))
print(sentence)
7
2
1
8
8
1
27
PubChemisachemicaldatabase.
Concatenating words 1 through 5 makes for an awkard output to read. You can also connect the words together as a string with spaces by using the `join()’ function.
newSentence = " ".join( [word1, word2, word3, word4, word5]) + word6
print(newSentence)
len(newSentence)
PubChem is a chemical database.
31
When the variable
newSentence
was created, why wasword6
concatenated to the join?How would the ouput change if the code in the above cell was:
newSentence = " ".join( [word1, word2, word3, word4, word5, word6])
print(newSentence)
len(newSentence)
In the following code cell, write a join statement to create a variable called url for the following variables where your join character is a
/
. Your output will be a url you can click and will take you to a webpage with data if you were successful.
https = "https://pubchem.ncbi.nlm.nih.gov"
compound = "compound"
CID = "2244"
url = # write your join statement here.
print(url)
Lists#
Another common data structure in python is the list. Lists can be used to group several values or variables together, and are declared using square brackets [ ]
.
List values are separated by commas.
Python has several built in functions which can be used on lists.
The built-in function len
can be used to determine the length of a list.
# This is a list
energy_kcal = [-13.4, -2.7, 0, 5.4, 42.1]
# The number of items in a list can be calculated with the len() function.
energy_length = len(energy_kcal)
# print the list
print(energy_kcal)
# print the list length
print('The length of this list is:', energy_length)
[-13.4, -2.7, 0, 5.4, 42.1]
The length of this list is: 5
If you want to operate on a particular element of the list, you use the list name and then put in brackets which element of the list you want. In python counting starts at zero, so the first element of the list is list[0]. If you don’t know the length of the list, the last element of the list is list[-1]. This method avoids the need to calculate the length of the list. Negative indexing allows access to elements from the end of the list. list[-2] accesses the second element from the end, which is the penultimate item.
print(energy_kcal)
# Print the first element of the list
print(energy_kcal[0])
#print the last item in the list list when the length is known
print(energy_kcal[3])
# print the last item in the list when the length is unknown
print(energy_kcal[-1])
[-13.4, -2.7, 0, 5.4, 42.1]
-13.4
5.4
42.1
In the code blow below, there is a list with random letters.
print the output for the 4th letter in the list
print the penultimate letter of the list of the list
print the length of the list
now that you know the length of the list, provide another method to access the letter stored at the second from the end position
letters = ['m', 's', 'b', 'f', 'x', 'v', 'w', 'y', 'n', 'b', 'm', 'a', 'z', 'e', 'u', 'a', 'h', 'f', 'm', 's', 'j', 'n', 'w', 'j', 'p', 'v', 'f', 'v', 'm', 'p', 'u', 'd', 'i', 's', 'j', 'h', 'f', 'm', 'l', 'l', 'r', 'l', 'u', 'r', 'o', 'l', 'v', 'a', 'k', 'j', 'h', 'i', 'x', 'x', 'e', 'v', 't', 'r', 'g', 'c', 'a', 'v', 'r', 'v', 'z', 's', 'n', 't', 'n', 'f', 'e', 'i', 'b', 'y', 'g', 'q', 'g', 'o', 'a', 'g', 'v', 'x', 'j', 'n', 'i', 'f', 'p', 'a', 'j', 'v', 'n', 'm', 'c', 'l', 'c', 'z', 'x', 'k', 'f', 'r']
# write your code here to output the 4th letter
# write your code here to output the penultimate letter
# write your code here to print the length of the list
# write your code here to provide the second from the end position now that you know the length of the list
You can use an element of a list as a variable in a calculation.
# Convert the second list element to kilojoules. 1 kcal = 4.184 kjoules
energy_kilojoules = energy_kcal[1] * 4.184
print(energy_kilojoules)
Repeating an operation many times: for loops#
Often, you will want to do something to every element of a list. The structure to do this is called a for loop. The general structure of a for loop is
for variable in list:
do things using variable
There are two very important pieces of syntax for the for loop. Notice the colon : after the word list. You will always have a colon at the end of a for statement. If you forget the colon, you will get an error when you try to run your code.
The second thing to notice is that the lines of code under the for loop (the things you want to do several times) are indented. Indentation is very important in python. There is nothing like an end or exit statement that tells you that you are finished with the loop. The indentation shows you what statements are in the loop. Each indentation is 4 spaces by convention in Python 3. However, if you are using an editor which understands Python, it will do the correct indentation for you when you press the tab key on your keyboard. In fact, the Jupyter notebook will notice that you used a colon (:
) in the previous line, and will indent for you (so you will not need to press tab).
Let’s use a loop to change all of our energies in kcal to kJ.
for item in energy_kcal:
kJ = item * 4.184
print(kJ)
-56.0656
-11.296800000000001
0.0
22.593600000000002
176.1464
Now it seems like we are really getting somewhere with our program! But it would be even better if instead of just printing the values, it saved them in a new list. To do this, we are going to use the append function. The append function adds a new item to the end of an existing list. The general form of the append function is
list_name.append(new_thing)
for item in energy_kcal:
kJ = item * 4.184
energy_kJ.append(kJ)
print(energy_kJ)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[23], line 3
1 for item in energy_kcal:
2 kJ = item * 4.184
----> 3 energy_kJ.append(kJ)
5 print(energy_kJ)
NameError: name 'energy_kJ' is not defined
This is another example of an error message.
This code doesn’t work because on the first iteration of our loop, the list energy_kJ doesn’t exist. To make it work, we have to start the list outside of the loop. The list can be blank when we start it, but we have to start it.
energy_kJ = []
for item in energy_kcal:
kJ = item * 4.184
energy_kJ.append(kJ)
print("energies in kcal", energy_kcal)
print("energies in kj",energy_kJ)
energies in kcal [-13.4, -2.7, 0, 5.4, 42.1]
energies in kj [-56.0656, -11.296800000000001, 0.0, 22.593600000000002, 176.1464]
The following code cell has a list of temperatures in celsius. Create a new list called kelvin that converts the temperatures to kelvin and prints out both lists.
celsius =[-273.15,0,37,100]
# write your code here
Making choices: logic statements#
Within your code, you may need to evaluate a variable and then do something if the variable has a particular value. This type of logic is handled by an if statement. In the following example, we only append the negative numbers to a new list.
negative_energy_kJ = []
for item in energy_kJ:
if item < 0:
negative_energy_kJ.append(item)
print(negative_energy_kJ)
[-56.0656, -11.296800000000001]
Other logic operations include
equal to
==
not equal to
!=
greater than
>
less than
<
greater than or equal to
>=
less than or equal to
<=
You can also use and, or, and not to check more than one condition.
zero_or_less = []
for item in energy_kJ:
if item < 0 or item == 0: # note: we could also use- if item <=0 to achieve the same goal.
zero_or_less.append(item)
print(zero_or_less)
[-56.0656, -11.296800000000001, 0.0]
To define what happens if the if
statement is not met, you can use the else
keyword.
negative_numbers = []
positive_numbers = []
for item in energy_kJ:
if item < 0:
negative_numbers.append(item)
else:
positive_numbers.append(item)
print("Negative numbers:", negative_numbers)
print("Positive Numbers: ", positive_numbers)
Negative numbers: [-56.0656, -11.296800000000001]
Positive Numbers: [0.0, 22.593600000000002, 176.1464]
The output above presents a unique problem. We have a value of 0.0 in our positive number output. Our use case has three different conditions that can exist: negative, positive, or zero.
elif
is a conditional statement in Python used in conjunction with if
and else
. It allows for multiple conditions to be checked sequentially. If the initial if
condition is false, the elif
condition is evaluated. If the elif
condition is true, its corresponding block of code is executed. There can be multiple elif
statements, allowing for various conditions to be checked. If none of the if
or elif
conditions are true, the optional else
block can be executed
negative_items = []
positive_items = []
zero_value_items = []
for item in energy_kJ:
if item < 0:
negative_items.append(item)
elif item == 0:
zero_value_items.append(item)
else:
positive_items.append(item)
print("Negative value items:", negative_items)
print("Positive value items: ", positive_items)
print("Zero value items: ", zero_value_items)
Negative value items: [-56.0656, -11.296800000000001]
Positive value items: [22.593600000000002, 176.1464]
Zero value items: [0.0]
Python Problem 1#
Using the code sample for concatenation as your guide, generate clickable URLs to get the name of each each molecule from the PubChem database from the given compound identification numbers. Loop through each item in the list to generate your URLs. Hint: think about variable class type if you get an error.
CIDS = [10624, 62857, 3365, 4946, 3821]
# write your code here
Python Problem 2#
The Haber Process converts nitrogen and hydrogen to ammonia:
The ΔH° of reaction is -91.8 kJ/mol.
the ΔS° of reaaction is -0.198 kJ/mol.
You are provided a list of temperatures in Kelvin in the code cell below. You are to calcuate Gibbs energy this reaction for each of the temperatures in the list and determine if the reaciton is spontaneous (ΔG° is a negative value) or nonspontaneous (ΔG° is a postive value). You will need to loop through each item in the list, calculate ΔG° and output the data indicating if the reaction is spontaneous or not. If the value is 0, report that the reaction is at equilibrium.
deltaH = -91.8 #kJ/mole
deltaS = -0.198 #kJ/(mole K)
temp = [100, 273, 298, 373, 463.6363636363636363636, 500, 1000] #Kelvin
# write your code here
Python Problem 3#
So far you have been using basic functions in python. There are many more functions.
Read https://weisscharlesj.github.io/SciCompforChemists/notebooks/chapter_01/chap_01_notebook.html#python-functions
Using the math module, calculate the pH of a solution that has an [H3O+] of 1.0 x 10-5 M.
# write your code here
Acknowledgements#
This notebook was developed by Ehren Bucholtz (Ehren.Bucholtz@uhsp.edu) and is a modification of the 00_python_basics jupyter notebook from the MolSSI Cheminformatics Workshop (MIT License) . MolSSI is the Molecular Sciences Software Institute.
This notebook also takes inspiration from Chapter 1: Basic Python from Charlie Weiss’s book Scientific Computing for Chemists with Python, license (CC BY-NC-SA 4.0.
This work is licensed under CC BY-NC-SA 4.0.