3. Overview of Jupyter Notebooks#
Learning Objectives#
Understand role of Jupyter Notebooks in Science
Understand the difference between Jupyter notebook files (.ipynb) and Python files (.py)
Understand the difference *.py and *.ipynb files
1. Jupyter Notebooks in Science#
We will be using Jupyter notebooks to code in Python because they are an ideal educational platform for teaching coding to science majors. Many of the Python packages we will use are data science packages and we here is an outline of some of the roles Jupyter Notebooks play in data science.
Exploratory Data Analysis (EDA): Load and inspect data interactively and use a wide variety of visualizations, while documenting insights and observations alongside the code
Data Cleaning and Preprocessing: A critical part of data analysis where one identifies and handles missing values and outliers. One can also normalize or scale the data and create new features for analysis.
Model Development: Develop machine learning models, implement and test different algorithms, tune hyperparameters, evaluate model performance and visualize model results.
Data Visualization and Reporting: Create interactive visualizations and reports, that can be exported in a variety of formats.
Overview of IDEs#
An IDE (Interactive Development Environment) is like a word processor for software development and offers several core features: We will go over them in lecture and you should be familiar with what they are, but we will probably not use them in class. IDEs are really made for software development, and that is not really our goal. So they excel at developing code, but the major shortcoming is that when you run code in an IDE, you are running an entire program (file). In a jupyter lab you can run code one cell at a time. Some of the key features of an IDE are:
Text Editor:
Advanced editing capabilities with syntax highlighting for various programming languages
Auto-completion and intelligent code suggestions
Code folding and navigation tools
Multiple file editing and split views
Compiler/Interpreter:
Built-in compilation or interpretation of code
Often supports multiple programming languages
Provides immediate feedback on syntax errors
Debugger:
Tools for setting breakpoints and stepping through code
Variable inspection and modification during runtime
Stack trace analysis
Built-in Terminal:
Integrated command-line interface for executing scripts and commands
Often context-aware of the current project environment
Project Management:
File and directory structure organization
Version control integration (e.g., Git)
Build dependency management (can sync with virtual environments)
Refactoring Tools:
Automated code restructuring and optimization
Renaming variables, functions, and classes across the project
Code Analysis:
Static code analysis for potential issues
Code style and quality checks
Extensibility:
Plugin systems for adding new features and language support
Customizable user interface and keybindings
Common IDEs (for Python)#
Thonny
Designed specifically for beginners
Simple, clean interface with low learning curve
Comes bundled with Python, easy to set up
Limited features compared to more advanced IDEs
Not suitable for large-scale or complex projects
Lacks support for other programming languages
VSCode
Lightweight and fast
Highly customizable with powerful debugging capabilities
Free and open-source
Supports multiple programming languages
PyCharm
Comprehensive suite of Python-specific tools
Excellent for large-scale Python projects
Strong support for web development frameworks like Django and Flask
Powerful refactoring capabilities
Integrated package management
Resource-intensive, may run slowly on less powerful hardware
Steeper learning curve, especially for beginners
Professional edition is paid
Spyder
Optimized for data science workflows
Integrates well with scientific Python libraries (SciPy, NumPy, Matplotlib)
Includes features like variable explorer and data visualization tools
Integrated into Conda and so updated with Conda
More focused on scientific computing, may not be ideal for general Python development
Less extensive plugin ecosystem compared to VSCode or PyCharm
Overview of Jupyter Notebooks#
Jupyter Lab/Notebook is an interactive computational environment that combines code execution, rich text, mathematics, plots and rich media. It provides:
Web-based Interface:
Accessible through a web browser
Notebook interface combining code, output, and documentation
File browser for managing notebooks and other files
Code Cells:
Interactive code execution in multiple programming languages
Support for Python, R, Julia, and many other languages via kernels
Cell-by-cell execution with immediate output
Markdown Cells:
Rich text editing with Markdown syntax
Support for mathematical equations using LaTeX
Ability to create formatted documentation alongside code
Output Display:
Inline visualization of plots, charts, and graphs
Display of tables, images, and interactive widgets
HTML and JavaScript rendering capabilities
Kernel Management:
Support for multiple programming language kernels
Ability to switch between kernels in the same notebook
Kernel interruption and restart options
File Handling:
Import and export of notebooks in various formats (e.g., PDF, HTML)
Support for different file types (e.g., CSV, JSON) within the environment
Terminal Access (in Jupyter Lab):
Integrated terminal for command-line operations
Access to the underlying file system and shell commands
Version Control:
Basic integration with version control systems (e.g., Git)
Cell-based history and checkpoints
2. Jupyter vs. Python file types#
Jupyter notebooks (.ipynb) and regular Python files (.py) serve different purposes and have distinct characteristics. Jupyter notebooks are interactive documents that combine code, rich text, visualizations, and other media, and can be run in Jupytter’s ecosystem (notebooks, labs, hubs) or Google Collab (by simply uploading to your Google Drive). Python files are typically developed in Interactive Development Environments (IDEs) like PyCharm, VS-Code, Thonny, Spyder, etc, and are often run from the command line. There are also versions of Python (Circuit Python) that run on microcontrollers and can be implemented into embedded devices.
The goal of this course is to introduce students to Python packages of use to scientific discovery and many of the students will have no prior coding experience. So after a lot of thought, I have decided to limit this course to the use of Jupyter Notebooks and not require students to use an IDE. There are some features, like Jupyter “Magic Commands” that only work in Notebooks, and we will do our best to avoid using them in the code we develop. So you should be able to port your code to a different IDE and run it in other environments.
Converting Between File Types#
Converting between .ipynb and .py files is possible but comes with some challenges:
.ipynb to .py:**#
Go to File/Export Notebook As/Executable Script
Code cells are preserved, but markdown and rich output are typically converted to comments.
Execution order may be lost if cells were run out of order.
Magic commands (e.g., %matplotlib inline) may cause issues in regular Python environments.
.py to .ipynb:**#
Code is typically placed in a single cell, losing the interactive nature of notebooks.
Comments may be converted to markdown cells, but formatting might be lost.
No automatic conversion of print statements to rich output. jupytext package can programmatically convert *.ipynb to *.py files with same names.
3. Jupyter Lab vs Google Colab#
You can actually run Jupyter notebooks in your Google Drive using Google Colab. In this class you will be using both, but you are required to run your own Jupyter Lab server and can not use Google Colab for all assigments. The reason is that you can not create virtual environments in Google Colab, and unless a package is installed by default, you have to reinstall it each time you open the notebook in Colab.
Jupyter Lab:
Runs locally on your machine
Requires installation and setup
Full control over the environment and packages
Files stored locally by default
Google Colab:
Cloud-based, runs in a browser
No installation required
Limited control over the environment
Biggest Problem is that you can not develop your own kernel with preconfigured packages, this means you must reload non-Google-installed packages every time you run a notebook.
Acknowledgements#
This content was developed with assistance from Perplexity AI and Chat GPT. Multiple queries were made during the Fall 2024 and the Spring 2025.