3. Overview of Jupyter Notebooks#

Learning Objectives#

  1. Understand role of Jupyter Notebooks in Science

  2. Understand the difference between Jupyter notebook files (.ipynb) and Python files (.py)

  3. Understand the difference *.py and *.ipynb files

1. Jupyter Notebooks in Science#

We will be using Jupyter notebooks to code in Python because they are an ideal educational platform for teaching coding to science majors. Many of the Python packages we will use are data science packages and we here is an outline of some of the roles Jupyter Notebooks play in data science.

  • Exploratory Data Analysis (EDA): Load and inspect data interactively and use a wide variety of visualizations, while documenting insights and observations alongside the code

  • Data Cleaning and Preprocessing: A critical part of data analysis where one identifies and handles missing values and outliers. One can also normalize or scale the data and create new features for analysis.

  • Model Development: Develop machine learning models, implement and test different algorithms, tune hyperparameters, evaluate model performance and visualize model results.

  • Data Visualization and Reporting: Create interactive visualizations and reports, that can be exported in a variety of formats.

Overview of IDEs#

An IDE (Interactive Development Environment) is like a word processor for software development and offers several core features: We will go over them in lecture and you should be familiar with what they are, but we will probably not use them in class. IDEs are really made for software development, and that is not really our goal. So they excel at developing code, but the major shortcoming is that when you run code in an IDE, you are running an entire program (file). In a jupyter lab you can run code one cell at a time. Some of the key features of an IDE are:

  1. Text Editor:

    • Advanced editing capabilities with syntax highlighting for various programming languages

    • Auto-completion and intelligent code suggestions

    • Code folding and navigation tools

    • Multiple file editing and split views

  2. Compiler/Interpreter:

    • Built-in compilation or interpretation of code

    • Often supports multiple programming languages

    • Provides immediate feedback on syntax errors

  3. Debugger:

    • Tools for setting breakpoints and stepping through code

    • Variable inspection and modification during runtime

    • Stack trace analysis

  4. Built-in Terminal:

    • Integrated command-line interface for executing scripts and commands

    • Often context-aware of the current project environment

  5. Project Management:

    • File and directory structure organization

    • Version control integration (e.g., Git)

    • Build dependency management (can sync with virtual environments)

  6. Refactoring Tools:

    • Automated code restructuring and optimization

    • Renaming variables, functions, and classes across the project

  7. Code Analysis:

    • Static code analysis for potential issues

    • Code style and quality checks

  8. Extensibility:

    • Plugin systems for adding new features and language support

    • Customizable user interface and keybindings

Common IDEs (for Python)#

  1. Thonny

    • Designed specifically for beginners

    • Simple, clean interface with low learning curve

    • Comes bundled with Python, easy to set up

    • Limited features compared to more advanced IDEs

    • Not suitable for large-scale or complex projects

    • Lacks support for other programming languages

  2. VSCode

    • Lightweight and fast

    • Highly customizable with powerful debugging capabilities

    • Free and open-source

    • Supports multiple programming languages

  3. PyCharm

    • Comprehensive suite of Python-specific tools

    • Excellent for large-scale Python projects

    • Strong support for web development frameworks like Django and Flask

    • Powerful refactoring capabilities

    • Integrated package management

    • Resource-intensive, may run slowly on less powerful hardware

    • Steeper learning curve, especially for beginners

    • Professional edition is paid

  4. Spyder

    • Optimized for data science workflows

    • Integrates well with scientific Python libraries (SciPy, NumPy, Matplotlib)

    • Includes features like variable explorer and data visualization tools

    • Integrated into Conda and so updated with Conda

    • More focused on scientific computing, may not be ideal for general Python development

    • Less extensive plugin ecosystem compared to VSCode or PyCharm

Overview of Jupyter Notebooks#

Jupyter Lab/Notebook is an interactive computational environment that combines code execution, rich text, mathematics, plots and rich media. It provides:

  1. Web-based Interface:

    • Accessible through a web browser

    • Notebook interface combining code, output, and documentation

    • File browser for managing notebooks and other files

  2. Code Cells:

    • Interactive code execution in multiple programming languages

    • Support for Python, R, Julia, and many other languages via kernels

    • Cell-by-cell execution with immediate output

  3. Markdown Cells:

    • Rich text editing with Markdown syntax

    • Support for mathematical equations using LaTeX

    • Ability to create formatted documentation alongside code

  4. Output Display:

    • Inline visualization of plots, charts, and graphs

    • Display of tables, images, and interactive widgets

    • HTML and JavaScript rendering capabilities

  5. Kernel Management:

    • Support for multiple programming language kernels

    • Ability to switch between kernels in the same notebook

    • Kernel interruption and restart options

  6. File Handling:

    • Import and export of notebooks in various formats (e.g., PDF, HTML)

    • Support for different file types (e.g., CSV, JSON) within the environment

  7. Terminal Access (in Jupyter Lab):

    • Integrated terminal for command-line operations

    • Access to the underlying file system and shell commands

  8. Version Control:

    • Basic integration with version control systems (e.g., Git)

    • Cell-based history and checkpoints

2. Jupyter vs. Python file types#

Jupyter notebooks (.ipynb) and regular Python files (.py) serve different purposes and have distinct characteristics. Jupyter notebooks are interactive documents that combine code, rich text, visualizations, and other media, and can be run in Jupytter’s ecosystem (notebooks, labs, hubs) or Google Collab (by simply uploading to your Google Drive). Python files are typically developed in Interactive Development Environments (IDEs) like PyCharm, VS-Code, Thonny, Spyder, etc, and are often run from the command line. There are also versions of Python (Circuit Python) that run on microcontrollers and can be implemented into embedded devices.

The goal of this course is to introduce students to Python packages of use to scientific discovery and many of the students will have no prior coding experience. So after a lot of thought, I have decided to limit this course to the use of Jupyter Notebooks and not require students to use an IDE. There are some features, like Jupyter “Magic Commands” that only work in Notebooks, and we will do our best to avoid using them in the code we develop. So you should be able to port your code to a different IDE and run it in other environments.

Converting Between File Types#

Converting between .ipynb and .py files is possible but comes with some challenges:

.ipynb to .py:**#

Go to File/Export Notebook As/Executable Script

  • Code cells are preserved, but markdown and rich output are typically converted to comments.

  • Execution order may be lost if cells were run out of order.

  • Magic commands (e.g., %matplotlib inline) may cause issues in regular Python environments.

.py to .ipynb:**#

  • Code is typically placed in a single cell, losing the interactive nature of notebooks.

  • Comments may be converted to markdown cells, but formatting might be lost.

  • No automatic conversion of print statements to rich output. jupytext package can programmatically convert *.ipynb to *.py files with same names.

3. Jupyter Lab vs Google Colab#

You can actually run Jupyter notebooks in your Google Drive using Google Colab. In this class you will be using both, but you are required to run your own Jupyter Lab server and can not use Google Colab for all assigments. The reason is that you can not create virtual environments in Google Colab, and unless a package is installed by default, you have to reinstall it each time you open the notebook in Colab.

Jupyter Lab:

  • Runs locally on your machine

  • Requires installation and setup

  • Full control over the environment and packages

  • Files stored locally by default

Google Colab:

  • Cloud-based, runs in a browser

  • No installation required

  • Limited control over the environment

    • Biggest Problem is that you can not develop your own kernel with preconfigured packages, this means you must reload non-Google-installed packages every time you run a notebook.

Acknowledgements#

This content was developed with assistance from Perplexity AI and Chat GPT. Multiple queries were made during the Fall 2024 and the Spring 2025.