1. Primative Data Structures

1. Primative Data Structures#

Learning Objectives#

Understand how integers, floats and strings are stored in memory as binary strings.
Understand differences in primitive numeric data structures
Understand strings
- formatting
- indexing
- slicing
- concatenation

Data Structure	Description
`int`	Represents whole numbers (positive, negative, or zero). Example: `42`, `-7`, `0`.
`float`	Represents decimal (floating-point) numbers. Example: `3.14`, `-0.001`, `2.0`.
`complex`	Represents numbers with real and imaginary parts. Example: `3+4j`.
`bool`	Represents truth values, either `True` or `False`.
`str`	Represents immutable sequences of Unicode characters (text data). Example: `"hello"`.
`bytes`	Represents immutable sequences of raw bytes. Example: `b'hello'`.

1. Integers#

Integers are whole numbers and represented as binary bit strings where each cell is 2$^n$

Arbitrary Precision#

In many languages like C and Java the size of an integer is limited by the number of bites in a memory block, so a 32 bit chip would have have 2$^31$ memory cells available to represent the number with the last one being for the sign. Python though has arbitrary precision where the number of cells can be extended and so the size on an integer is limited by the available memory. It should also be noted that the sign is actually in the metadata of the PyLongObject used to store integers in the heap memory.

The following program has arbitrary precision. When you run the following cell you will be prompted to input a binary sequence of arbitrary length and it will convert it to decimal. I suggest you try the following values 1 10 100 110

Interactive Input Disabled for Book Build

The following cell normally asks for user input using `input()` , but Jupyter Book cannot compile cells that require manual input, so the input lines have been commented out.

To try this interactively in a Jupyter Notebook:

Uncomment the line #binary_input = input("Enter an eight digit binary number (e.g., 10110011): ").strip().
Comment out the line under binary_input = "01010101" (note this is not necessary due to the order of execution but best practice)

This will allow you to enter values manually when you run the cell in a Jupyter Notebook. Note shift+enter will not move you to the next cell if there is input and you need to use esc + down-arrow to move to the next cell without executing the input function.

def binary_to_decimal(binary_list):
    """
    Converts a binary list to its decimal equivalent.
    :param binary_list: List of integers (0s and 1s) representing a binary number.
    :return: The decimal equivalent of the binary number.
    """
    decimal_value = 0
    num_bits = len(binary_list)

    # Loop through the binary list
    for i in range(num_bits):
        # Get the value at position i (0 or 1)
        bit = binary_list[i]
        # Calculate the power of 2 based on position (from most to least significant)
        power = num_bits - 1 - i
        # Add the value to the total if the bit is 1
        if bit == 1:
            decimal_value += 2 ** power

    return decimal_value
binary_input = "01010101"
# Uncomment the Following Line to Prompt user for input
#binary_input = input("Enter an eight digit binary number (e.g., 10110011): ").strip()

# Convert input string to a list of integers
binary_list = [int(bit) for bit in binary_input]

# Validate the input
if any(bit not in (0, 1) for bit in binary_list):
    print("Error: Please enter a binary number consisting only of 0s and 1s.")
else:
    # Optionally pad to a fixed length if needed
    # (not required for the calculation but can help in specific contexts)
    padded_length = 8  # Define desired length (optional)
    if len(binary_list) < padded_length:
        binary_list = [0] * (padded_length - len(binary_list)) + binary_list

    # Convert to decimal and display results
    decimal_value = binary_to_decimal(binary_list)
    print(f"Binary: {binary_list}")
    print(f"Decimal: {decimal_value}")

Binary: [0, 1, 0, 1, 0, 1, 0, 1]
Decimal: 85

Table of built-in functions that operate on integers#

Function	Description
`abs(x)`	Returns the absolute value of the integer `x`.
`bin(x)`	Converts the integer `x` to its binary representation as a string.
`bool(x)`	Converts the integer `x` to a boolean. Non-zero integers return `True`; `0` returns `False`.
`divmod(a, b)`	Returns a tuple `(a // b, a % b)` with the quotient and remainder when dividing `a` by `b`.
`float(x)`	Converts the integer `x` to a floating-point number.
`format(x, f)`	Formats the integer `x` according to the specified format string `f`.
`hex(x)`	Converts the integer `x` to its hexadecimal representation as a string.
`id(x)`	Returns the memory address of the integer `x` (unique identifier in Python).
`int(x)`	Converts `x` to an integer. Can be used to truncate decimals or convert strings to integers.
`isinstance(x, y)`	Returns `True` if `x` is an instance of class `y`, such as `int`, otherwise `False`.
`len(s)`	Returns the length of a collection or iterable. Does not apply directly to integers but useful for sequences.
`max(*args)`	Returns the largest value from the given arguments, which may include integers.
`min(*args)`	Returns the smallest value from the given arguments, which may include integers.
`oct(x)`	Converts the integer `x` to its octal representation as a string.
`pow(x, y, z)`	Returns `(x y) % z`. If `z` is not provided, returns `x y`.
`repr(x)`	Returns the string representation of the integer `x` as it would appear in Python source code.
`round(x, n)`	Rounds the integer `x` to `n` decimal places. For integers, this just returns `x`.
`str(x)`	Converts the integer `x` to a string representation.
`sum(iterable)`	Sums all the elements in the iterable, which may include integers.
`type(x)`	Returns the type of `x`, such as `<class 'int'>` for integers.

2. Floats#

The bit string of a float consists of three parts, the significand and exponent bits, which are like scientific notation, and a sign bit

Common Built-in Functions for float

Function	Description
`abs(x)`	Returns the absolute value of the float.
`round(x, n)`	Rounds the float to `n` decimal places.
`int(x)`	Converts the float to an integer by truncating the decimal part.
`float(x)`	Converts a number or string to a float.
`pow(x, y)`	Returns `x` raised to the power of `y`.
`max(*args)`	Returns the largest of the arguments, which can include floats.
`min(*args)`	Returns the smallest of the arguments, which can include floats.
`sum(iterable)`	Returns the sum of a sequence of numbers, including floats.

3. Complex Numbers#

\[ z = a + bi \]

or

\[ z = a + bj \]

$a$ is the real part.

$b$ is the imaginary part.

$i = \sqrt{-1}$

Note, the real and imaginary parts are stored as floats.

Note, engineers often use j as the imaginary part, as do many python modules.

Complex Plane#

horizontal axis corresponds to real part (a)
vertical axis corresponds to imaginary part (b)

The following code shows an interactive graph for calculating the magnitude of complex numbers in a complex plane

# you may need to install ipywidgets from the cmd line
#open your terminal and activate your env, then run:
#conda install -c conda-forge ipywidgets
import numpy as np
import matplotlib.pyplot as plt
from ipywidgets import interactive
import ipywidgets as widgets

def plot_complex(real, imag):
    # Create the complex number
    z = complex(real, imag)
    
    # Calculate magnitude
    magnitude = abs(z)
    
    # Create figure and axis
    fig, ax = plt.subplots(figsize=(5, 5))
    
    # Plot the point
    ax.scatter([z.real], [z.imag], color='blue', s=100, label=f'{z}')
    
    # Plot vector from origin to point
    ax.plot([0, z.real], [0, z.imag], 'b-', linewidth=2)
    
    # Add grid and axes
    ax.grid(True)
    ax.axhline(y=0, color='k', linestyle='-', linewidth=0.5)
    ax.axvline(x=0, color='k', linestyle='-', linewidth=0.5)
    
    # Set equal aspect ratio
    ax.set_aspect('equal')
    
    # Set limits with some padding
    limit = max(abs(real), abs(imag), 5) + 1
    ax.set_xlim(-limit, limit)
    ax.set_ylim(-limit, limit)
    
    # Labels and title
    ax.set_xlabel('Real Part')
    ax.set_ylabel('Imaginary Part')
    ax.set_title(f'Complex Number: {z}\nMagnitude: {magnitude:.2f}')
    
    # Add legend
    ax.legend()
    
    plt.show()

# Create interactive widget
interactive_plot = interactive(
    plot_complex,
    real=widgets.FloatSlider(min=-10, max=10, step=0.1, value=3),
    imag=widgets.FloatSlider(min=-10, max=10, step=0.1, value=4)
)
# Display the interactive plot
display(interactive_plot)

z = 3 + 4j
print(type(z))  # <class 'complex'>
print(z.real)   # 3.0
print(z.imag)   # 4.0

<class 'complex'>
3.0
4.0

Complex Conjugate#

the complex conjugate of a value is taken by changing the sign of the imaginary part.

Complex number: $$z = a + bi$$ Complex conjugate of z: $$\overline{z} = a - bi$$

Table 1: Attributes of `complex`#

Attribute	Description
`real`	Returns the real part of the complex number.
`imag`	Returns the imaginary part of the complex number.

Table 2: Methods of `complex`#

Method	Description
`conjugate()`	Returns the complex conjugate of the number.

Table 3: Common Built-in Functions for `complex`#

Function	Description
`abs(x)`	Returns the magnitude (absolute value) of the complex number `x`.
`complex(real, imag)`	Creates a complex number with the specified real and imaginary parts.
`float(x.real)`	Converts the real part of the complex number `x` to a float.
`int(x.real)`	Converts the real part of the complex number `x` to an integer.
`id(x)`	Returns the memory address of the complex number `x` (unique identifier in Python).
`isinstance(x, complex)`	Returns `True` if `x` is an instance of the `complex` class, otherwise `False`.
`pow(x, y)`	Computes `x` raised to the power of `y`, where `x` can be a complex number.
`repr(x)`	Returns the string representation of the complex number `x` as it would appear in Python source code.
`str(x)`	Converts the complex number `x` to a string representation.
`type(x)`	Returns the type of `x`, such as `<class 'complex'>` for complex numbers.

4. Boolean#

Boolean are logic values that can have two values, True of False

0=False 1=True

Boolean Operators#

Operator	Type	Description
`and`	Boolean	Logical AND: Returns `True` if both operands are `True`. Also know as the conjunction operator and often represented with ∧.
`or`	Boolean	Logical OR: Returns `True` if at least one operand is True`. Also know as the disjunction operator and often represented as ∨ .
`not`	Boolean	Logical NOT: Negates a boolean value. Often represented as ¬.
`==`	Comparison	Equality: Returns `True` if both operands are equal.
`!=`	Comparison	Inequality: Returns `True` if operands are not equal.
`<`	Comparison	Less Than: Returns `True` if the left operand is less than the right.
`>`	Comparison	Greater Than: Returns `True` if the left operand is greater than the right.
`<=`	Comparison	Less Than or Equal To: Returns `True` if the left operand is ≤ the right.
`>=`	Comparison	Greater Than or Equal To: Returns `True` if the left operand is ≥ the right.
`is`	Identity	Identity: Returns `True` if operands refer to the same object in memory.
`is not`	Identity	Not Identity: Returns `True` if operands do not refer to the same object.
`in`	Membership	Membership: Returns `True` if the left operand is in the right operand.
`not in`	Membership	Not Membership: Returns `True` if the left operand is not in the right.

Boolean values are singletons, which mean they always have the same location in memory

print(id(True))  # Always the same memory address
print(id(False))
print(id(3))

106438831589088
106438831589056
128075655282992

# Booleans are singletons
print("Booleans:")
print(f"id(True):  {id(True)}")
print(f"id(False): {id(False)}")
print(f"True is True: {True is True}")
print()

# Small integers are also singletons (cached from -5 to 256 in CPython)
print("Small integers (within cache range -5 to 256):")
a = 100
b = 100
print(f"id(100): {id(a)} == {id(b)}")
print(f"a is b: {a is b}")
print()

# Larger integers are not necessarily singletons
print("Larger integers (outside cache range):")
x = 1000
y = 1000
print(f"id(1000): {id(x)} != {id(y)}")
print(f"x is y: {x is y}")
print()

# But forcing identity with assignment
print("Same object via direct assignment:")
z = x
print(f"x is z: {x is z}")

Booleans:
id(True):  106438831589088
id(False): 106438831589056
True is True: True

Small integers (within cache range -5 to 256):
id(100): 128075655286096 == 128075655286096
a is b: True

Larger integers (outside cache range):
id(1000): 128073698613968 != 128073698614704
x is y: False

Same object via direct assignment:
x is z: True

Most objects are not singletons and their memory location is dynamically generated. Other singletons are pre-cached integers -5 to 256

**Note the memory location of the True/False Boolean values for the above two codes blocks is the same.

Truth Tables#

What is a Truth Table?#

A truth table is a mathematical table used to represent the output of logical operations for all possible input combinations. They are essential in understanding boolean algebra and logic gates.

Truth Table for AND (`and`)#

The and operation returns True only if both inputs are True.

A	B	A and B
`False`	`False`	`False`
`False`	`True`	`False`
`True`	`False`	`False`
`True`	`True`	`True`

Truth Table for OR (`or`)#

The or operation returns True if at least one input is True.

A	B	A or B
`False`	`False`	`False`
`False`	`True`	`True`
`True`	`False`	`True`
`True`	`True`	`True`

Truth Table for NOT (`not`)#

The not operation inverts the boolean value.

A	not A
`False`	`True`
`True`	`False`

Strings#

Strings are sequences of characters inside of quotes, and typically represent text. Many data files are converted to strings when transmitted across the web. When you input a number using the input function input() it is a string.

Strings in Python:
- Stored as Unicode code points internally.
- Encoded to formats like UTF-8 or ASCII when converted to bytes.
ASCII:
- ASCII American Standard Code for Information Interchange - was developed in the early days of computers and is based on the 8-bit byte.
- Original ascii was 7 bits
- Extended ascii was full 8 bits

ASCII code CC BY-SA Yuriy Arabskyy

UTF-8
- Unicode Transformation Format - 8-bit
- ASCII characters (basic Latin letters, numbers, and symbols) using a single byte (8 bits).
- Non-ASCII characters (like accented letters, emoji, or characters from other scripts) using 2 to 4 bytes.
- Key features:
  - Backward compatibility with ASCII: Characters in the ASCII range (0-127) are encoded identically in UTF-8.
  - Variable length: Characters outside the ASCII range use additional bytes (2-4 bytes).
  - Efficient storage: Common characters take fewer bytes, while less-common ones use more.

The following program converts ascii characters to a bit string, or a bit string to its ascii character

# Function to convert an ASCII character to a bit string
def ascii_to_bitstring(char):
    if len(char) != 1 or ord(char) > 127:
        raise ValueError("Input must be a single ASCII character (0-127).")
    return format(ord(char), '08b')  # 8 bits for ASCII

# Function to convert a bit string to an ASCII character
def bitstring_to_ascii(bitstring):
    if len(bitstring) != 8 or not all(bit in '01' for bit in bitstring):
        raise ValueError("Input must be an 8-bit binary string.")
    return chr(int(bitstring, 2))

# Simulate user input for book build
choice = '1'           # Change to '2' to test other direction
char = 'A'             # Used if choice == '1'
bitstring = '01000001' # Used if choice == '2'

# Uncomment the following lines to enable interactive use
# choice = input("Enter 1 or 2: ")
# if choice == '1':
#     char = input("Enter an ASCII character: ")
# elif choice == '2':
#     bitstring = input("Enter an 8-bit binary string: ")

# Main logic
if choice == '1':
    try:
        print(f"Bit string: {ascii_to_bitstring(char)}")
    except ValueError as e:
        print(e)
elif choice == '2':
    try:
        print(f"ASCII character: {bitstring_to_ascii(bitstring)}")
    except ValueError as e:
        print(e)
else:
    print("Invalid choice.")

Bit string: 01000001

Uncomment this code in the following cell when you transfer to a Jupyter Notebook

#choice = input("Enter 1 or 2: ")
#if choice == '1':
#    char = input("Enter an ASCII character: ")
#elif choice == '2':
#    bitstring = input("Enter an 8-bit binary string: ")

Strings as a Datatype#

Strings use single, double or triple quotes.
- triple quote strings span multiple lines and are known as docstrings if placed at the beginning of a script.
- Strings can be concatenated (“1” + “1” becomes “11”).
Strings are ordered
- Strings can be indexed.
- Strings can be reverse indexed.
Strings are immutable.
- Can not change items in a string (do not support reassignment).
- Can be reassigned by slicing and concatenation (you are in effect making a new string).

#This code shows the three ways of inputing the number "one"
print(f'1 is {type(1)}')
print(f'1.0 is {type(1.)}')
print(f'"1" is {type("1")}')

1 is <class 'int'>
1.0 is <class 'float'>
"1" is <class 'str'>

Input function returns strings: the following code inputs a float and assigns it to the variable var, but when we check its type, is is a string. So you must always convert values users input to floats or integers if those are the type of value you want.

Note: Input functions pause the code execution until the value is inputted and Jupyter does not auto-move to the next cell.

After inputting the value with normal enter you are returned to the same cell. To move to the next cell try shift+down-arrow.

# This code shows that input what looks like a float is really a number
var = '1.0' # simulate user input from input()
#var=input("enter the number 1.0, be sure to include the point zero")
print(f'The value of var is {var}, which is of type {type(var)}.')

The value of var is 1.0, which is of type <class 'str'>.

Uncomment this code in the following cell when you transfer to a Jupyter Book

#var=input("enter the number 1.0, be sure to include the point zero")

print(f'var + var = {var + var}')

var + var = 1.01.0

To input it as a float, we could convert it to a float in the input statement

var = 1.0
#var = float(input("enter the number 1.0, be sure to include the point zero"))
print(f'{var} is of type {type(var)}.')
print(f'var + var = {var + var}.')

1.0 is of type <class 'float'>.
var + var = 2.0.

1+1

1.0 + 1.0

2.0

"1.0" + "1.0"

'1.01.0'

Note: you can add a float to an integer, but you can not add a string to either a float or an integer

1 + 1.0

2.0

The following code will give a type error as you are adding a string to an integer

"1" + 1  

Uncomment this code in the following cell when you transfer to a Jupyter Notebook

#1 + '1.0'

#this does not work, but change '1.0' to float('1.0')
#1 + '1.0'

1 + float("1")

2.0

Note in the next cell the first * is the multiplicative operator, while the second * is the string symbol for a star.

#note the first * is the multiplication operator and the second is a string
print(20*"*")

********************

Escape Sequences#

An escape sequence is when a character in a string literally follows a black slash and gives python special meaning. For example, a string literal that starts with a single quote is terminated by the second single quote,, unless it is written as an escape sequence

Escape Sequence	Description
`\'`	Single quote
`\"`	Double quote
`\\`	Backslash
`\n`	Newline
`\t`	Horizontal tab
`\r`	Carriage return
`\b`	Backspace
`\f`	Form feed
`\v`	Vertical tab

Note, \n moves the cursor to a new line, \r moves it to the beginning of the current line and allows for overwriting of text.

print('Li\tBe\tB\n3\t4\t5\n\n\nNa\tMg\tAl\n11\t12\t13')

Li	Be	B
3	4	5


Na	Mg	Al
11	12	13

Formatting Strings#

There are three basic string formatting methods

f-string method
.format() method
modulus method

The f-string method will be our preferred method, and the only formatting method I expect you to be able to code with. but you need to be aware of the other methods so you know what is going on if you find code that uses them.

Comparing string format methods#

the following code does the same thing using each method. In this class we will need to be able to use f-string, but you need to know about the other methods

name = "Neon"
atomic_number = 10
print(f"The atom {name} has an atomic number of {atomic_number}.")
print("The atom {} has an atomic number of {}.".format(name, atomic_number))
print("The atom %s has an atomic number of %d." % (name, atomic_number))

The atom Neon has an atomic number of 10.
The atom Neon has an atomic number of 10.
The atom Neon has an atomic number of 10.

f-string method#

Introduced in Python 3.6
Place f in front of string
indicate variables with {}
Allows you to skip the .format() step

molecule = "hydrogen"
molarmass = 1.00794
mass = 10
moles = mass/molarmass
print(f'{mass} grams of the molecule {molecule} has {moles} moles.\n \t{mass} \
grams of the molecule {molecule} has {moles} moles. \n \
\t\t{mass} grams of the molecule {molecule} has {moles} moles.\n \
\t\t\t the molar mass of hydrogen is {molarmass}.\n \
This is the last line.')

10 grams of the molecule hydrogen has 9.921225469770025 moles.
 	10 grams of the molecule hydrogen has 9.921225469770025 moles. 
 		10 grams of the molecule hydrogen has 9.921225469770025 moles.
 			 the molar mass of hydrogen is 1.00794.
 This is the last line.

note in the above print statement each line but the last has a . What does that do? remove it from one line and see what happens? If needed, ask your AI.

variable=12345.6789
print(f'12345678901234567890')
print(f'{variable:.4f}\n{variable:.2f}')

print(f'12345678901234567890')
print(f'{variable:10.2f}\n{variable:11.2f} \n{variable:11.3f}')

12345678901234567890
6789
68
12345678901234567890
68
68 
679

print(f"How long is this number going to be {1234.5678:2.3f}")

How long is this number going to be 1234.568

.format() method#

This is more common and you will see lots of examples that use the .format() method

Generic Syntax

“text {} text {} text {}.” .format(var1, var2, var3)

Can use index numbers Place width and precision in curly brackets ({}).

#Demonstration of use of index numbers in .format() method
atom1, atom2, atom3 = "Hydrogen", "Helium", "Lithium"
print("This string has order first, second, third variable {} {} {}."\
      .format(atom1,atom2,atom3))
print("This string has order third variable {2}, second variable {1} "\
      "first variable {0}.".format(atom1, atom2, atom3))
print("This string has order first variable {0}, second variable {1} "\
      "first variable {0}.".format(atom1, atom2, atom3))

This string has order first, second, third variable Hydrogen Helium Lithium.
This string has order third variable Lithium, second variable Helium first variable Hydrogen.
This string has order first variable Hydrogen, second variable Helium first variable Hydrogen.

#Demonstration of use of width and precision of .format() method
print("{0:10}|{1:10}\n{2:10}|{3:10}\n1234567890|1234567890"\
      .format("hydrogen",1.00784, "helium", 4.002602))
print(10*" * ")
print("{0:10}|{1:10.3f}\n{2:10}|{3:10.3f}".format("hydrogen",1.00784, "helium", 4.002602))

hydrogen  |   1.00784
helium    |  4.002602
1234567890|1234567890
 *  *  *  *  *  *  *  *  *  * 
hydrogen  |     1.008
helium    |     4.003

#left aligned spaced at 15 digits
print("{0:<15}|{0:<15}".format("left align", "center align", "right align"))
#left then right aligned spaced at 15 digits
print("{0:<15}|{2:>15}".format("left align", "center align", "right align"))
#center aligned spaced at 15 digits
print("{1:^15}|{1:^15}".format("left align", "center align", "right align"))

left align     |left align     
left align     |    right align
 center align  | center align  

Modulus (%) Method#

This is the oldest form of formatting and we will seldom use it. But you need to be aware of it in the event you find some old code that uses this method

Generic Syntax

print("The molar mass of hydrogen is: %1.2f"%(1.00784))
print("The molar mass of hydrogen is: %10.2f"%(1.00784))
print("The molar mass of hydrogen is: %1.0f"%(1.00784))
print("The molar mass of hydrogen is: %1.5f"%(1.00784))

The molar mass of hydrogen is: 1.01
The molar mass of hydrogen is:       1.01
The molar mass of hydrogen is: 1
The molar mass of hydrogen is: 1.00784

#here we are assigning two variables in one line, and then printing them in a line
entity, molar_mass="water", 18.01528
print("The molar mass of %s is %.3f g/mol." %(entity,molar_mass))
print("The molar mass of %s is %.5f g/mol." %(entity,molar_mass))
print("The molar mass of %s is %.3d g/mol." %(entity,molar_mass))

The molar mass of water is 18.015 g/mol.
The molar mass of water is 18.01528 g/mol.
The molar mass of water is 018 g/mol.

Can you explain why the last line above makes no sense? Hint, d stands for decimal integer, that is a base 10 integer (and not a float). Ask you AI if you need further help.

String Functions#

Table of String Functions#

Function	Description
`len(string)`	Returns the length of the string
`max(string)`	Returns the character with the highest ASCII value in the string
`min(string)`	Returns the character with the lowest ASCII value in the string
`ord(character)`	Returns the Unicode code point of a character
`chr(integer)`	Returns the character represented by a Unicode code point
`str(object)`	Returns a string representation of an object

String Methods#

Table of String Methods#

Method	Description
`capitalize()`	Returns a copy of the string with its first character capitalized
`casefold()`	Returns a casefolded copy of the string
`center(width[, fillchar])`	Returns a centered string
`count(sub[, start[, end]])`	Returns the number of non-overlapping occurrences of substring
`encode([encoding[, errors]])`	Returns an encoded version of the string
`endswith(suffix[, start[, end]])`	Returns True if the string ends with the specified suffix
`expandtabs([tabsize])`	Returns a copy of the string where all tab characters are replaced
`find(sub[, start[, end]])`	Returns the lowest index of substring if found
`format(args, *kwargs)`	Formats the string
`index(sub[, start[, end]])`	Like find(), but raises ValueError when the substring is not found
`isalnum()`	Returns True if all characters in the string are alphanumeric
`isalpha()`	Returns True if all characters in the string are alphabetic
`isdecimal()`	Returns True if all characters in the string are decimal
`isdigit()`	Returns True if all characters in the string are digits
`islower()`	Returns True if all cased characters in the string are lowercase
`isnumeric()`	Returns True if all characters in the string are numeric
`isspace()`	Returns True if all characters in the string are whitespaces
`istitle()`	Returns True if the string follows the rules of a title
`isupper()`	Returns True if all cased characters in the string are uppercase
`join(iterable)`	Joins the elements of an iterable to the end of the string
`ljust(width[, fillchar])`	Returns a left-justified string
`lower()`	Returns a copy of the string converted to lowercase
`lstrip([chars])`	Returns a copy of the string with leading characters removed
`partition(sep)`	Returns a tuple containing the part before the separator, the separator itself, and the part after the separator
`replace(old, new[, count])`	Returns a copy of the string where all occurrences of old have been replaced by new
`rfind(sub[, start[, end]])`	Returns the highest index of substring if found
`rindex(sub[, start[, end]])`	Like rfind(), but raises ValueError when the substring is not found
`rjust(width[, fillchar])`	Returns a right-justified string
`rpartition(sep)`	Returns a tuple containing the part before the separator, the separator itself, and the part after the separator
`rsplit([sep[, maxsplit]])`	Returns a list of words in the string, using sep as the delimiter string
`rstrip([chars])`	Returns a copy of the string with trailing characters removed
`split([sep[, maxsplit]])`	Returns a list of words in the string, using sep as the delimiter string
`splitlines([keepends])`	Returns a list of the lines in the string, breaking at line boundaries
`startswith(prefix[, start[, end]])`	Returns True if the string starts with the specified prefix
`strip([chars])`	Returns a copy of the string with leading and trailing characters removed
`swapcase()`	Returns a copy of the string with uppercase characters converted to lowercase and vice versa
`title()`	Returns a titlecased version of the string
`translate(table)`	Returns a copy of the string in which each character has been mapped through the given translation table
`upper()`	Returns a copy of the string converted to uppercase
`zfill(width)`	Returns a copy of the string padded with zeros

Examples of string.methods#

.capitalize()#

name = "chemistry class"
print(name.capitalize())

Chemistry class

.find()#

text = 'Chemistry is the best course'

print(text.find('best'))
#notice how it returns the index number

.join()#

text = ['Chemistry', 'is', 'the', 'best', 'class']
#the above is a list (container type data structure) of strings
space = " "

print(space.join(text))

new_text = space.join(text)
print(text)
print(f'The type of {text} is {type(text)}.')
print(new_text)
print(f'The type of "{new_text}" is {type(new_text)}.')

Chemistry is the best class
['Chemistry', 'is', 'the', 'best', 'class']
The type of ['Chemistry', 'is', 'the', 'best', 'class'] is <class 'list'>.
Chemistry is the best class
The type of "Chemistry is the best class" is <class 'str'>.

.lstrip(’ ‘)#

text = ",,,,,Hello world"
x = text.lstrip(",")
print(x)
print(type(x))

Hello world
<class 'str'>

.replace(‘old_word’,’new_word’)#

text = 'Ozone Oxygen Butane'

new_text = text.replace('Butane', 'Hydrogen')

print(new_text)
print(f'The memory heap location of text is {id(text)}. \
\nThe memory heap location of new_text is {id(new_text)}.')

Ozone Oxygen Hydrogen
The memory heap location of text is 128073697902960. 
The memory heap location of new_text is 128073696273440.

.rstrip(‘delimeter’,max number of splits from right)#

text = "a,b,c,d,e"
x = text.rsplit(",",2)
y = text.rsplit(",",1)
z = text.rsplit(",")
print(x)
print(y)
print(z)
print(f'text is {type(text)}.\nx is {type(x)}.') 

['a,b,c', 'd', 'e']
['a,b,c,d', 'e']
['a', 'b', 'c', 'd', 'e']
text is <class 'str'>.
x is <class 'list'>.

.split(‘delimiter’,max from left)#

text = "a,b,c,d,e"
x = text.split(",",2)
y = text.split(",",1)
z = text.split(",")
print(x)
print(y)
print(z)
print(f'text is {type(text)}.\nx is {type(x)}.') 

['a', 'b', 'c,d,e']
['a', 'b,c,d,e']
['a', 'b', 'c', 'd', 'e']
text is <class 'str'>.
x is <class 'list'>.

Assign variables on a split

filename = "report.final.version.pdf"
name, ext = filename.rsplit(".", 1)
print(name)  # 'report.final.version'
print(ext)   # 'pdf'

report.final.version
pdf

text = "Pythonnnnnn"
print(text.rstrip('n'))

Pytho

Note, -1 means no limit and is the same as split()

text = 'This programming langauge is cool'

x = text.split(None,-1)
print(x)
x=text.split()
print(x)

x = text.split(None,1)
print(x)

x = text.split(None,2)
print(x)

['This', 'programming', 'langauge', 'is', 'cool']
['This', 'programming', 'langauge', 'is', 'cool']
['This', 'programming langauge is cool']
['This', 'programming', 'langauge is cool']

text1 = 'python programming class'
print(text1.title())
 
text2 = 'pyTHoN proGraMMinG clAsS'.title()
print(text2.title())

Python Programming Class
Python Programming Class

String Indexing#

Indexing#

In python each character has a string index position, that can be accessed with brackets [index]. The initial index is zero and increases going left to right. You can also reverse index starting from the right most position with value [-1]

Character	B	o	r	o	n
Index	0	1	2	3	4
Index	-5	-4	-3	-2	-1

Slicing#

Slicing is a way of taking slices of a string. The format is

word[ i : j : stride ]

i is the initial index digit and is inclusive (included in the output)
j is the final index digit and is exclusive (excluded in the output)
stride is the value of increment, which is 1 if omitted.

Indexing & Slicing Summary Table#

Operation	Syntax	Explanation	Example (`s = "Chemistry"`)	Output
Positive Indexing	`s[i]`	Access the character at index `i`	`s[0]`	`'C'`
Negative Indexing	`s[-i]`	Access from the end (last = `-1`)	`s[-1]`	`'y'`
Slicing (Start to End, Exclusive)	`s[start:stop]`	Extracts characters from `start` to `stop-1`	`s[0:4]`	`'Chem'`
Omitting Start	`s[:stop]`	Starts from index `0`	`s[:4]`	`'Chem'`
Omitting End	`s[start:]`	Goes to the end of the string	`s[4:]`	`'istry'`
Full Slice	`s[:]`	Copies the entire string	`s[:]`	`'Chemistry'`
Skipping Steps	`s[start:stop:step]`	Extracts characters with `step`	`s[0:9:2]`	`'Cemsr'`
Reversing a String	`s[::-1]`	Returns the string in reverse order	`s[::-1]`	`'yrtsimehC'`

element1 = "barium"
element2 = 'chlorine'
print(element1[0:3])
print(element2[-1:-4:-1])
print(element2[-3:-1])
print(element2[-3:])

bar
eni
in
ine

String Indexing, Slicing and Concatenation#

In the following examples we will use reverse indexing to slice off the base ending of an element name and add the suffix of an ion of that element. At a later stage we will build an app that builds up on this and names ions from their base elements.

# Base compound names
element1 = "Chlorine"
element2 = "Sulfur"
element3 = "Phosphorus"

# Slicing to remove suffixes
chloride = element1[: -3] + "ide"  # Removing "ine" → "Chloride"
sulfate = element2[: -2] + "ate"   # Removing "ur" → "Sulfate"
phosphite = element3[: -4] + "ite" # Removing "us" → "Phosphite"

# Display results
print(f"Base Element: {element1} → Ion Name: {chloride}")
print(f"Base Element: {element2} → Ion Name: {sulfate}")
print(f"Base Element: {element3} → Ion Name: {phosphite}")

Base Element: Chlorine → Ion Name: Chloride
Base Element: Sulfur → Ion Name: Sulfate
Base Element: Phosphorus → Ion Name: Phosphite

Strings are immutable#

we learned earlier that strings are immutable, that is, once created they can not be changed, and it looks like in the above example we changed the strings, but we did not. Instead, we made new ones, as shown by adding some code to print out the memory address. So the ion is really a new object, and the original object still exists.

# Base compound names
element1 = "Chlorine"
element2 = "Sulfur"
element3 = "Phosphorus"

# Slicing to remove suffixes
chloride = element1[: -3] + "ide"  # Removing "ine" → "Chloride"
sulfate = element2[: -2] + "ate"   # Removing "ur" → "Sulfate"
phosphite = element3[: -4] + "ite" # Removing "us" → "Phosphite"

# Display results
print(f"Base Element: {element1} → Ion Name: {chloride}")
print(f"Note,\t {element1} has memory address {id(element1)} while, \n\t {chloride} has memory address {id(chloride)}.")
print(f"\nBase Element: {element2} → Ion Name: {sulfate}")
print(f"Note,\t {element2} has memory address {id(element1)} while, \n\t {sulfate} has memory address {id(sulfate)}.")
print(f"\nBase Element: {element3} → Ion Name: {phosphite}")
print(f"Note,\t {element3} has memory address {id(element1)} while, \n\t {phosphite} has memory address {id(phosphite)}.")

Base Element: Chlorine → Ion Name: Chloride
Note,	 Chlorine has memory address 128073560169520 while, 
	 Chloride has memory address 128073560176560.

Base Element: Sulfur → Ion Name: Sulfate
Note,	 Sulfur has memory address 128073560169520 while, 
	 Sulfate has memory address 128073560177136.

Base Element: Phosphorus → Ion Name: Phosphite
Note,	 Phosphorus has memory address 128073560169520 while, 
	 Phosphite has memory address 128073560177200.

Acknowledgements#

This content was developed with assistance from Perplexity AI and Chat GPT. Multiple queries were made during the Fall 2024 and the Spring 2025.