7. Databases as External Data Structures

7. Databases as External Data Structures#

How Databases and Web Services Fit Into the Picture#

They are an interesting hybrid, they store non-volatile data from the perspective of the python program, but access to them introduces elements of volatility:

Non-Volatile: Data persists when the Python kernel/interpreter is shut down.
Volatile: Unlike files, a database can be unavailable (e.g., server downtime), and data integrity can be affected by concurrent access.

In a structured progression from volatile to non-volatile data from the perspective of a program, databases fit in as external but queryable structures:

Unlike files, they support efficient querying (e.g., SQL queries).
Unlike in-memory Python data structures, they persist beyond execution of a program.
Unlike JSON or CSV files, they allow dynamic updates and complex relationships.

1. How a Computer Stores Data: Directory vs. Database#

So far we have looked at directories (file systems) and we need to understand the difference between a file system and a database.

1.1 Directory (File Storage)#

A directory is a hierarchical tree structure where files and folders are organized.
Files are stored as binary data on disk, and each file has metadata (name, size, type, last modified date).
You can perform basic searches (filename, date modified).
No structured relations between files, beyond directory hierarchies.

Example: How a Directory Stores Chemistry Data#

/research_data/
│── periodic_table.csv
│── elements.json
│── hydrogen.png
│── lab_notes.txt

You can find “hydrogen.png” by looking in /research_data/ and searching filenames.
But if you want to ask, “What are all the elements with atomic number < 10?”, the directory itself does not support such structured queries.

1.2 Database Storage#

A database is an organized system for storing and retrieving data with structured querying.

Data is stored in a structured format (tables, documents, graphs).
Indexing enables fast queries (retrieving data without scanning everything).
Relationships exist between data.

2. Database Models: SQL, NoSQL, and Beyond#

There are multiple ways to structure and organize data in a database. The most common types include:

2.1 Relational Databases (SQL)#

Data is stored in tables with predefined columns (fields).
Tables are related to each other through keys.
Uses Structured Query Language (SQL) to manage and query data.
Example: SQLite, PostgreSQL, MySQL.

Example Schema:

  Element Table:
  | AtomicNumber | Symbol | Name  | AtomicWeight |
  |-------------|--------|-------|--------------|
  | 1           | H      | Hydrogen | 1.008      |
  | 2           | He     | Helium   | 4.0026     |

Relationships allow for structured queries like:

SELECT * FROM Element WHERE AtomicNumber = 1;

2.2 NoSQL Databases#

Designed for flexibility and scale, handling semi-structured or unstructured data.
No fixed table structure; data is stored as documents, key-value pairs, graphs, or columns.

Example JSON document in MongoDB:

{
  "AtomicNumber": 1,
  "Symbol": "H",
  "Name": "Hydrogen",
  "AtomicWeight": 1.008
}

Queries use JSON-based query languages, e.g.:

db.elements.find({"Symbol": "H"})

2.3 Graph Databases (RDF and Triple Stores)#

Used to model complex relationships, often found in semantic web and linked data applications.
Data is stored as triples: (subject, predicate, object).

Example:

("Hydrogen", "isElement", "True")
("Hydrogen", "atomicNumber", "1")
("Hydrogen", "bondsWith", "Oxygen")

Queries use SPARQL instead of SQL.

Understanding Databases#

The core role of a database is to efficiently and persistently store data and in a structured and useful manner that allows access to the data. These goals can be achieved through 5 key database activities.

Key Database Activity	Purpose
1. Transactions (Data Modification & Integrity)	Manage data changes (`INSERT`, `UPDATE`, `DELETE`) ensuring consistency with commit & rollback mechanisms.
2. Schema Management	Define tables, columns, relationships, and constraints to structure the data.
3. Data Querying & Retrieval	Access stored data without modifying it (`SELECT` queries), allowing filtering, aggregation, and reporting.
4. Indexing & Optimization	Speed up data access and searches by using indexes and optimizing queries for performance.
5. Concurrency & Multi-User Access	Manage simultaneous access to prevent conflicts and ensure data consistency across multiple users.

3. SQL & Declarative Programming#

In a declarative programming language (SQL)you describe what you want, now how to get it. In an object oriented procedural approach (Python) you specify step-by-step how to proceed with a task

3.1 Comparing SQL to Python#

Feature	SQL (Database Language)	Python (Interpreted Language)
Purpose	Query, store, and manipulate structured data	General-purpose programming
Execution	Runs inside a database engine (SQLite, MySQL)	Runs in an interpreter (CPython, PyPy)
Typing	Strongly typed schema (e.g., `INTEGER`, `TEXT`)	Dynamically typed (`int`, `str`, etc.)
Flow Control	Limited (`CASE`, loops in stored procedures)	Full control (`if`, `for`, `while`)
Usage	Optimized for fast data retrieval and transactions	Used for programming logic, automation, and data processing

3.2: SQL Breakdown Using Core Database Activities#

Each database activity maps to a set of SQL commands (syntax).

Database Activity	Purpose	SQL Syntax to Learn
1. Transactions (Data Modification & Integrity)	Modify data while ensuring consistency.	`INSERT`, `UPDATE`, `DELETE`, `COMMIT`, `ROLLBACK`
2. Schema Management	Define the structure of the database.	`CREATE TABLE`, `ALTER TABLE`, `DROP TABLE`, `PRAGMA` (SQLite-specific)
3. Data Querying & Retrieval	Retrieve data without modifying it.	`SELECT`, `WHERE`, `ORDER BY`, `GROUP BY`, `HAVING`, `LIMIT`
4. Indexing & Optimization	Improve query performance.	`CREATE INDEX`, `EXPLAIN QUERY PLAN`, `ANALYZE`
5. Concurrency & Multi-User Access	Ensure consistency in multi-user environments.	SQLite handles this differently, but `BEGIN TRANSACTION`, `LOCKING` concepts apply.

3.2a: SQL Data Structures#

“How Data is Stored”, this connects to schema management (Activity 2).

Concept	Python Equivalent	SQL Equivalent (Schema Management)
Lists & Dicts	`list`, `dict` (collections of data)	Tables (collections of structured data)
Attributes & Keys	Dictionary keys, object attributes	Columns & Primary Keys
Relationships	Dictionary of dictionaries, nested lists	Foreign keys, joins

Example:

CREATE TABLE elements (
    AtomicNumber INTEGER PRIMARY KEY,
    Name TEXT NOT NULL,
    Symbol TEXT NOT NULL,
    AtomicMass REAL NOT NULL
);

This defines a structured storage format just like defining a Python dict.

3.2b: SQL Operations#

“How Data is Manipulated”, this connects to transactions (Activity 1) and querying (Activity 3).

Python Concept	SQL Equivalent	Example Query
Adding data (append)	`INSERT INTO`	`INSERT INTO elements VALUES (1, 'Hydrogen', 'H', 1.008);`
Updating values	`UPDATE`	`UPDATE elements SET AtomicMass = 1.0079 WHERE Symbol = 'H';`
Removing items (del, pop)	`DELETE FROM`	`DELETE FROM elements WHERE AtomicNumber = 1;`
Filtering (if, list comprehensions)	`WHERE`	`SELECT * FROM elements WHERE AtomicNumber > 10;`
Sorting (sorted function)	`ORDER BY`	`SELECT * FROM elements ORDER BY AtomicMass DESC;`

3.2c: SQL Query Execution#

“How Queries Work”, this connects to Indexing & Optimization (Activity 4).

In Python, functions improve code efficiency.
In SQL, indexes speed up searches.

Example:

CREATE INDEX idx_symbol ON elements(Symbol);

This acts like a Python dictionary lookup for faster queries on Symbol.

3.2d: SQL Transactions & Multi-User Access#

This aligns with Transactions (Activity 1) and Concurrency (Activity 5).

In Python, you use try-except blocks to handle errors.
In SQL, transactions ensure integrity.

Example:

BEGIN TRANSACTION;
UPDATE elements SET AtomicMass = 1.0079 WHERE Symbol = 'H';
COMMIT;

BEGIN TRANSACTION acts like starting a Python function.
COMMIT saves the changes (like return in Python).
ROLLBACK undoes changes if there’s an error (like try-except in Python).

4. SQLite#

SQLite is a single file SQL database that can be generated and manipulated with the built-in Python sqlite3 module. Unlike larger SQL databases it does not require running a database server, and so it is an excellent tool to introduce basic database functions with.

The Role of SQLite in the SQL Ecosystem#

Feature	SQLite	Server-Based SQL (MySQL, PostgreSQL)
Setup	No setup needed (file-based)	Requires installation & server
Best For	Local storage, embedded apps, small projects	Large-scale web applications, enterprise systems
Performance	Fast for small datasets	Scales well with huge data
Concurrency	Limited multi-user access	Handles multiple users simultaneously
Portability	Just a single `.db` file	Requires database servers & configurations

4.1 SQLite Objects (Classes)#

Databases involve several key operations and there are several SQLite python objects (classes) that we need to understand in order to communicate with and manipulate data within a SQLite database. These are

Connection: Handles database access of a session (open, close, commit, rollback).
Cursor: Handles query execution and navigation, you can have multiple cursor objects in a connection.
Row Allows you to access values by column name instead of index.

Since SQLite is, making it a great tool for small to medium datasets, research projects, and prototyping.

4.1a: Connection Methods (`sqlite3.Connection`)#

Method	Description
`connect(database)`	Opens a connection to the SQLite database file.
`close()`	Closes the connection to free up resources.
`commit()`	Saves (commits) changes to the database.
`rollback()`	Rolls back uncommitted changes.
`cursor()`	Creates a cursor object to execute SQL commands.
`execute(sql, params)`	Executes a single SQL command.
`executemany(sql, seq_of_params)`	Executes a SQL command multiple times with different values.

4.1b Cursor Methods (`sqlite3.Cursor`)#

Method	Description
`execute(sql, params)`	Executes a SQL query.
`executemany(sql, seq_of_params)`	Executes a batch of SQL queries.
`fetchone()`	Fetches the next row of the result set.
`fetchall()`	Fetches all remaining rows of the result set.
`fetchmany(size)`	Fetches `size` number of rows.
`description`	A tuple containing metadata about the columns.

4.2Common Database Tasks in SQLite#

Task	SQL Query	SQLite (`sqlite3`) Example
Show all tables	`SELECT name FROM sqlite_master WHERE type='table';`	`cursor.execute("SELECT name FROM sqlite_master WHERE type='table';")`
Show table structure	`PRAGMA table_info(elements);`	`cursor.execute("PRAGMA table_info(elements);")`
Get all records	`SELECT * FROM elements;`	`cursor.execute("SELECT * FROM elements;")`
Get first 5 records	`SELECT * FROM elements LIMIT 5;`	`cursor.execute("SELECT * FROM elements LIMIT 5;")`
Find specific element	`SELECT * FROM elements WHERE symbol='H';`	`cursor.execute("SELECT * FROM elements WHERE symbol='H';")`
Filter by condition	`SELECT name, symbol FROM elements WHERE boiling_point < 25;`	`cursor.execute("SELECT name, symbol FROM elements WHERE boiling_point < 25;")`
Sort results	`SELECT * FROM elements ORDER BY atomic_mass DESC;`	`cursor.execute("SELECT * FROM elements ORDER BY atomic_mass DESC;")`
Insert new record	`INSERT INTO elements (symbol, name, atomic_mass) VALUES ('Xx', 'XenonX', 200);`	`cursor.execute("INSERT INTO elements (symbol, name, atomic_mass) VALUES (?, ?, ?)", ('Xx', 'XenonX', 200))`
Update a record	`UPDATE elements SET atomic_mass=18.02 WHERE symbol='H2O';`	`cursor.execute("UPDATE elements SET atomic_mass=? WHERE symbol=?", (18.02, 'H2O'))`
Delete a record	`DELETE FROM elements WHERE symbol='Xx';`	`cursor.execute("DELETE FROM elements WHERE symbol=?", ('Xx',))`

5. SQLite Activity#

5.1: Create SQL Database#

We will start by creating a simple database called halogens.db in the sandbox with a table called elements that contains 4 fields (AtomicNumber, Name, Symbol and AtomicMass), along with specifying the type of data in each field.

import sqlite3
import os

# Define the directory structure
sandbox_dir = os.path.expanduser("~/sandbox")  # Parent directory
os.makedirs(sandbox_dir, exist_ok=True)  # Ensure directories exist
halogens_db_sbpath = os.path.join(sandbox_dir, "halogens.db")  # SQLite database file

# Connect to SQLite database (creates file if it doesn't exist)
conn = sqlite3.connect(halogens_db_sbpath)
cursor = conn.cursor()

# Create a table called elements
cursor.execute('''
    CREATE TABLE IF NOT EXISTS elements (
        AtomicNumber INTEGER PRIMARY KEY,
        Name TEXT NOT NULL,
        Symbol TEXT NOT NULL,
        AtomicMass REAL NOT NULL
    )
''')

# Commit changes and close connection
conn.commit()
conn.close()

import sqlite3

conn = sqlite3.connect(halogens_db_sbpath)
cursor = conn.cursor()

cursor.execute("PRAGMA table_info(elements)")
schema = cursor.fetchall()
for col in schema:
    print(col)

conn.close()

(0, 'Element', 'TEXT', 0, None, 0)
(1, 'AtomicNumber', 'INTEGER', 0, None, 0)
(2, 'AtomicMass', 'REAL', 0, None, 0)
(3, 'Electronegativity', 'REAL', 0, None, 0)

import sqlite3

# Connect to your database
conn = sqlite3.connect(halogens_db_sbpath)
cursor = conn.cursor()

# Select all rows from the elements table
cursor.execute("SELECT * FROM elements")
rows = cursor.fetchall()

# Print each row
for row in rows:
    print(row)

# Close the connection
conn.close()

('F', 9, 18.998, 3.98)
('Cl', 17, 35.45, 3.16)
('Br', 35, 79.904, 2.96)
('I', 53, 126.9, 2.66)
('At', 85, 210.0, 2.2)
('Ts', 117, 294.0, None)

5.2: Insert Elements#

import sqlite3

# Reconnect to the database
conn = sqlite3.connect(halogens_db_sbpath)
cursor = conn.cursor()

# Insert element data
elements_data = [
    ("H", 1, 1.008, 2.4),
    ("Li", 3, 6.94, 1.0)
]

cursor.executemany("INSERT INTO elements VALUES (?, ?, ?, ?)", elements_data)

# Commit changes and close connection
conn.commit()
conn.close()

Reopen DB and Show Inserted Elements

import sqlite3

# Connect to your database
conn = sqlite3.connect(halogens_db_sbpath)
cursor = conn.cursor()

# Select all rows from the elements table
cursor.execute("SELECT * FROM elements")
rows = cursor.fetchall()

# Print each row
for row in rows:
    print(row)

# Close the connection
conn.close()

('F', 9, 18.998, 3.98)
('Cl', 17, 35.45, 3.16)
('Br', 35, 79.904, 2.96)
('I', 53, 126.9, 2.66)
('At', 85, 210.0, 2.2)
('Ts', 117, 294.0, None)
('H', 1, 1.008, 2.4)
('Li', 3, 6.94, 1.0)

5.3: Changing Data Value#

import sqlite3

# Connect to your database
conn = sqlite3.connect(halogens_db_sbpath)
cursor = conn.cursor()

# Update Electronegativity for Hydrogen and Lithium
cursor.execute("""
    UPDATE elements
    SET Electronegativity = 2.2
    WHERE Element = 'H'
""")

cursor.execute("""
    UPDATE elements
    SET Electronegativity = 0.98
    WHERE Element = 'Li'
""")

# Commit changes and close connection
conn.commit()
conn.close()

Show Values have Changed

import sqlite3

# Connect to your database
conn = sqlite3.connect(halogens_db_sbpath)
cursor = conn.cursor()

# Select all rows from the elements table
cursor.execute("SELECT * FROM elements")
rows = cursor.fetchall()

# Print each row
for row in rows:
    print(row)

# Close the connection
conn.close()

('F', 9, 18.998, 3.98)
('Cl', 17, 35.45, 3.16)
('Br', 35, 79.904, 2.96)
('I', 53, 126.9, 2.66)
('At', 85, 210.0, 2.2)
('Ts', 117, 294.0, None)
('H', 1, 1.008, 2.2)
('Li', 3, 6.94, 0.98)

5.4: Deleting Data Value#

Hydrogen and Lithium are not halogens so lets remove them from the database

import sqlite3

conn = sqlite3.connect(halogens_db_sbpath)
cursor = conn.cursor()

# Delete incorrect rows based on suspicious 'Element' values
cursor.execute("DELETE FROM elements WHERE Element IN ('H', 'Li')")

conn.commit()
conn.close()

Show Hydrogen and Lithium are removed from the data base

5.5: Query Data#

Find all elements with Atomic Number >= 10.

import sqlite3

# Connect to your database
conn = sqlite3.connect(halogens_db_sbpath)
cursor = conn.cursor()

# Select all rows from the elements table
cursor.execute("SELECT * FROM elements")
rows = cursor.fetchall()

# Print each row
for row in rows:
    print(row)

# Close the connection
conn.close()

('F', 9, 18.998, 3.98)
('Cl', 17, 35.45, 3.16)
('Br', 35, 79.904, 2.96)
('I', 53, 126.9, 2.66)
('At', 85, 210.0, 2.2)
('Ts', 117, 294.0, None)

import sqlite3

# Reconnect and create a cursor
conn = sqlite3.connect(halogens_db_sbpath)
cursor = conn.cursor()

# Run a query
#cursor.execute("SELECT Element FROM elements WHERE AtomicNumber >= 10")
cursor.execute("SELECT Element, AtomicNumber FROM elements WHERE AtomicNumber >= 10")


# Fetch and print results
results = cursor.fetchall()
for row in results:
    print(row)  # Each row is a tuple (Name, Symbol)

# Close connection
conn.close()

('Cl', 17)
('Br', 35)
('I', 53)
('At', 85)
('Ts', 117)

5.6: Convert to Pandas DataFrame#

This converts the data to a Pandas DataFrame and prints top 5 rows

import sqlite3
import pandas as pd

# Reconnect to the database
conn = sqlite3.connect(halogens_db_sbpath)

# Read SQL query into DataFrame
df = pd.read_sql_query("SELECT * FROM elements", conn)

# Display DataFrame
print(df.head(10))

# Close connection
conn.close()

  Element  AtomicNumber  AtomicMass  Electronegativity
     F             9      18.998               3.98
    Cl            17      35.450               3.16
    Br            35      79.904               2.96
     I            53     126.900               2.66
    At            85     210.000               2.20
    Ts           117     294.000                NaN

5.7: Create a Database from halogens.csv#

Note, the workflow: we convert csv file to pandas dataframe and then the dataframe to the sql database, and we are replacing the data in the elements table, not appending to it

import sqlite3
import pandas as pd
import os

# Define the directory structure
sandbox_dir = os.path.expanduser("~/sandbox")  # Parent directory
os.makedirs(sandbox_dir, exist_ok=True)  # Ensure directories exist

# Define file paths
halogens_csv_sbpath = os.path.join(sandbox_dir, "halogens.csv")
halogens_db_sbpath = os.path.join(sandbox_dir, "halogens.db")  # SQLite database file

# Load CSV file into Pandas DataFrame
df = pd.read_csv(halogens_csv_sbpath)

# Connect to SQLite database (creates the file if it doesn’t exist)
conn = sqlite3.connect(halogens_db_sbpath)
cursor = conn.cursor()

# Create table (adjust column names to match CSV headers)
cursor.execute("""
    CREATE TABLE IF NOT EXISTS elements (
        Element TEXT NOT NULL,
        atomic_number INTEGER PRIMARY KEY,
        atomic_mass REAL,
        Electronegativity REAL
    )
""")

# Insert data from DataFrame into SQLite table, this replaces  the data
df.to_sql("elements", conn, if_exists="replace", index=False)

# Commit changes and close connection
conn.commit()
conn.close()

5.8: Print Tables in Database#

import sqlite3

conn = sqlite3.connect(halogens_db_sbpath)
cursor = conn.cursor()

cursor.execute("SELECT name FROM sqlite_master WHERE type='table';")
tables = cursor.fetchall()

print("Tables in the database:", tables)
conn.close()

Tables in the database: [('elements',)]

5.9: Print Fields (Columns) in Database#

conn = sqlite3.connect(halogens_db_sbpath)
cursor = conn.cursor()

cursor.execute("PRAGMA table_info(elements);")
fields = [row[1] for row in cursor.fetchall()]

print("Fields in the 'elements' table:", fields)
conn.close()

Fields in the 'elements' table: ['Element', 'Atomic Number', 'Atomic Mass', 'Electronegativity']

Warning the column names ‘Atomic Number’ and “Atomic Mass’ have two words and we need to make them one word or SQLite will give a name error if we seek to use them. That is, the following would throw an error:

conn = sqlite3.connect(halogens_db_file_path)
cursor = conn.cursor()

cursor.execute("PRAGMA table_info(elements);")
fields = [row[1] for row in cursor.fetchall()]

print("Fields in the 'elements' table:", fields)
conn.close()

conn = sqlite3.connect(halogens_db_sbpath)
cursor = conn.cursor()
cursor.execute('ALTER TABLE elements RENAME COLUMN "Atomic Mass" TO AtomicMass;')
cursor.execute('ALTER TABLE elements RENAME COLUMN "Atomic Number" TO AtomicNumber;')

cursor.execute("PRAGMA table_info(elements);")
fields = [row[1] for row in cursor.fetchall()]

print("Fields in the 'elements' table:", fields)
conn.close()

Fields in the 'elements' table: ['Element', 'AtomicNumber', 'AtomicMass', 'Electronegativity']

5.10: Print first 5 Records (rows)#

conn = sqlite3.connect(halogens_db_sbpath)
cursor = conn.cursor()

cursor.execute("SELECT * FROM elements LIMIT 5;")
records = cursor.fetchall()

for record in records:
    print(record)

conn.close()

('F', 9, 18.998, 3.98)
('Cl', 17, 35.45, 3.16)
('Br', 35, 79.904, 2.96)
('I', 53, 126.9, 2.66)
('At', 85, 210.0, 2.2)

5.11: Find Elements with mass less than 50 amu.#

conn = sqlite3.connect(halogens_db_sbpath)
cursor = conn.cursor()

cursor.execute("SELECT Element FROM elements WHERE AtomicMass < 50;")
light_elements = cursor.fetchall()

print("Elements that weight less than 50 amu:")
for element in light_elements:
    print(element)
cursor.execute("SELECT AtomicMass FROM elements WHERE Element LIKE 'Cl';")
var=cursor.fetchall()
conn.close()
print(var)
print(type(var))

Elements that weight less than 50 amu:
('F',)
('Cl',)
[(35.45,)]
<class 'list'>

Why Does fetchall() Return [(35.45,)]?

fetchall() always returns a list of tuples, even if there’s only one result.
Each row is returned as a tuple, and fetchall() collects all rows into a list.

var = cursor.fetchall()
atomic_mass = var[0][0]  # Extract first row (tuple) and first column (value)
print(atomic_mass)  # Output: 35.45
print(type(atomic_mass))  # Output: <class 'float'>

var[0] gives the first row → (35.45,)
var[0][0] gives the first value → 35.45

Query Method	Output	How to Extract Float
`fetchall()`	`[(35.45,)]`	`var[0][0]`
`fetchone()`	`(35.45,)`	`var[0]`

conn = sqlite3.connect(halogens_db_sbpath)
cursor = conn.cursor()

cursor.execute("SELECT AtomicMass FROM elements WHERE Element LIKE 'Cl';")
var = cursor.fetchone()  # Use fetchone() for a single value
conn.close()
atomic_mass = var[0]  # Extract float value
print(atomic_mass, type(atomic_mass))  # Output: 35.45 <class 'float'>

35.45 <class 'float'>

5.12: Find Element with largest Atomic Mass#

conn = sqlite3.connect(halogens_db_sbpath)
cursor = conn.cursor()

cursor.execute("SELECT Element, AtomicMass FROM elements ORDER BY AtomicMass DESC LIMIT 1;")
heaviest_element = cursor.fetchone()

print("Heaviest element:", heaviest_element)
conn.close()

Heaviest element: ('Ts', 294.0)

5.13: Convert db to pandas dataframe#

import sqlite3
import pandas as pd

# Connect to SQLite database
conn = sqlite3.connect(halogens_db_sbpath)

# Load entire 'elements' table into a DataFrame
df = pd.read_sql_query("SELECT * FROM elements;", conn)

# Close connection
conn.close()

# Display DataFrame
print(df)

  Element  AtomicNumber  AtomicMass  Electronegativity
     F             9      18.998               3.98
    Cl            17      35.450               3.16
    Br            35      79.904               2.96
     I            53     126.900               2.66
    At            85     210.000               2.20
    Ts           117     294.000                NaN

Acknowledgements#

This content was developed with assistance from Perplexity AI and Chat GPT. Multiple queries were made during the Fall 2024 and the Spring 2025.

7. Databases as External Data Structures

Contents

7. Databases as External Data Structures#

How Databases and Web Services Fit Into the Picture#

1. How a Computer Stores Data: Directory vs. Database#

1.1 Directory (File Storage)#

Example: How a Directory Stores Chemistry Data#

1.2 Database Storage#

2. Database Models: SQL, NoSQL, and Beyond#

2.1 Relational Databases (SQL)#

2.2 NoSQL Databases#

2.3 Graph Databases (RDF and Triple Stores)#

Understanding Databases#

3. SQL & Declarative Programming#

3.1 Comparing SQL to Python#

3.2: SQL Breakdown Using Core Database Activities#

3.2a: SQL Data Structures#

3.2b: SQL Operations#

3.2c: SQL Query Execution#

3.2d: SQL Transactions & Multi-User Access#

4. SQLite#

The Role of SQLite in the SQL Ecosystem#

4.1 SQLite Objects (Classes)#

4.1a: Connection Methods (sqlite3.Connection)#

4.1b Cursor Methods (sqlite3.Cursor)#

4.2Common Database Tasks in SQLite#

5. SQLite Activity#

5.1: Create SQL Database#

5.2: Insert Elements#

5.3: Changing Data Value#

5.4: Deleting Data Value#

5.5: Query Data#

5.6: Convert to Pandas DataFrame#

5.7: Create a Database from halogens.csv#

5.8: Print Tables in Database#

5.9: Print Fields (Columns) in Database#

5.10: Print first 5 Records (rows)#

5.11: Find Elements with mass less than 50 amu.#

5.12: Find Element with largest Atomic Mass#

5.13: Convert db to pandas dataframe#

Acknowledgements#

4.1a: Connection Methods (`sqlite3.Connection`)#

4.1b Cursor Methods (`sqlite3.Cursor`)#