5. Git and GitHub#

Overview#

  1. Introduction to Git and GitHub

  2. Cliff Notes

  3. Installing Git

  4. Quick Overview of Most Common Commands

  5. Create a Git Repository

  6. Your First Commit

  7. Git Commands

  8. Special Files

  9. Create and Upload to GitHub Repository

  10. Cloning and Downloading from a Repository

  11. Updating a Cloned Git Repository

  12. JupyterLab Git Extension

1. Introduction to Git and GitHub#

This is optional and not all students will use Git. This Material is also under development.

Git is a version control system that operates on your computer and allows you to track changes in your files over time, allowing you to save different versions of your work, revert to previous states, and collaborate with others. You install Git in your project folder by initializing it, which makes a special hidden .git directory. A directory with git installed is called a repository or repo. For science majors new to coding, Git provides a safety net as you experiment with your projects, allowing you to try new ideas without fear of losing your original work. It’s an essential tool for managing the evolution of your research and coding projects, even if you’re just starting out.

GitHub, on the other hand, is a web-based platform that builds upon Git’s functionality by providing a centralized location to store and share your Git repositories online. GitHub offers a way to build a portfolio of your work, learn from others’ code, and potentially collaborate on research projects with students and scientists around the world. It’s a valuable resource for learning coding practices and engaging with the scientific community, even if you’re just beginning your coding journey.

image.png

2. Cliff Notes#

These are quick core instructions for these activities without going into the nuances. If you are doing this the first time you should go through each activity to understand what you are doing. This is more for quick reference after you have done this once or twice, and need to refresh yourself.

  1. Install Git (only do this once)

    • sudo apt install git -y

  2. Initialize Repo (this recursively converts the current working directory to a git repo, and need not be repeated. It need not be done if you clone a repo). Do not create a repo within a repo (make a subdirectory of a git directory a repo)

    • git init

  3. Local commit

    • git status (optional)

    • git add . ( the’ .’ means everything in the current directory and all subdirectories)

    • git status (optional)

    • git commit -m"manditory message"

  4. Push to github (this only works after you have set things up

    • git push -u origin main (first time)

    • git push (subsequent times if you already set upstream to main)

  5. Clone Repo (make sure you are in the correct local git directory you want to clone to.

    • git cone https://github.com.the_repo_you_want

  6. Updating Cloned Repo (if needed, see instructions below to make sure it is correctly set up)

    • git pull

3. Installing Git#

In order to run git commands you first need to install git. Git is not part of the Conda system and although you can do this from within a Conda environment, we will first shut conda down.

  1. Type: which git This tells you where git is, and if it is installed. If it is installed you do not need to install it again.

  2. Type: git --version This tells you the version of the installed git. (if it is installed)

  3. conda deactivate from the base environment. You should see the (base) in front of the command prompt disappear. When using sudo apt commands you should deactivate conda so there is no (env) in front of the command prompt

  4. Type: sudo apt update This provides a list of updated systems level packages (not python)

  5. Type: sudo apt upgrade This installs the updated packages from the list

  6. Type: sudo apt install git -y installs git, the -y confirms the installation prompts.

  7. Type: git --version to confirm it is installed

4. Quick Overview of Most Common Actions#

(these are after you have created remote on GitHub)

Create Repo and Connect to Remote#

  • git init - done only once, this converts a local directory to a git repository

  • `git remote add origin - done once only, connects your local repo to a remote

Making a local commit and pushing to GitHub#

  • git add . - places anything you have changed in working directory to the staging area

  • git status - lets you check what you are doing before you commit it.

  • git commit -m"your required message" - commits material in staging to repo

  • git push -u origin main - pushes commits from local repository to GitHub repo that is linked to it.

5. Create a Git Repository (Local Repo)#

There are several strategies, and we are going to take make one of our project folders a repo. In so doing, any subfolder will also be part of the repo. So if you create 4 projects, you will create 4 repos. Note: if you clone my folder it will already be a git repo. The reason we are doing this is so you can commit one project without affecting the others, and upload that project to github without affecting the others. If we look at the projects folder below we see there is one subfolder, called py4sci, and that is my project folder for this class

image.png

First, we check that our working directory is not part of a git repo or within the tree of one:

Type: git rev-parse --is-inside-work-tree It will return True if you are, and give a fatal message if you are not.

image.png

Now that we now we are not within a repo tree, we can make the py4sci directory a git repo.

Type: git init this initializes the rep. And we now check by retyping the last command (use the up arrow on the terminal), and you see it returned true. Note, since we knew this was not in the tree of a repo we could have just looked at the hidden folders with ls -a, where the -a means all, and shows hidden folders. The .git folder is where the versions of git for this repo are held.

Type: ls -a

image.png

Note, if you look at the image below you will see some files in the repo with a *:Zone.Identifier ending. These were created by Windows, and we do not want to track them with git. So we will create a .gitignore file. Instead of creating one from scratch, we will use a template from a web API and then direct it to the .gitignore file with the following:

Type: curl https://www.toptal.com/developers/gitignore/api/python > .gitignore

now look at the folder and its hidden files

Type: ls -a and open the .gitignore file with the nano editor. Type nano .gitignore Note,this is a text editor and the mouse does not work. You need to use the arrows to navigate. ctrl-o will save output to the file it shows (type y) and ctrl-x exits the nano editor. the instructions are at the bottom. With the nano editor I added the following lines, and added Thumbs.db, which windows autogenerates to cache thumbnails.
## Windows Specific Files
*:Zone.identifier
Thumbs.db

image.png

6. Your First Commit#

There is essentially a 3 stage process to a git commit,

  1. git status - see the status of untracked files

  2. git add filename - this places the file in the staging area, see below for flags (. , -A)

  3. git status - this tells you what is staged

  4. git commit -m'your concise messsage #the message is required

Note: you may get a message saying: “Author identidy unknown”. I which case you need to type:

  • git config user.email "YOU@example.com"

  • git config user.name"YOUR NAME"

image.png

  1. Stage the files (identify the ones you want to commit). You have two options

    1. git add . Stage all changes

    2. git add filename1 filename2 Stage specific files

*Note in the image below I used git add . which would normally be fine because I have a .gitignore file that contains :ZoneIdentifier, telling git to ignore those files. But I had added those files before creating the .gitignore file and so all the files turned green after running the status. I should have run git add 01_aSetUpComputer.ipynb 01_bJyNBGettingStarted.ipynb this one time. The ZoneIdenifier are files Windows created and are not part of my code and so should not be added to the staging area. I fixed this in step 6, and we will discuss .gitignore files later.

  1. Type: git status. Note the files in red of turned green, and these will be committed.

image.png

  1. The actual commit. Now you must write a message for the commit, and there are two options

    • Option 1: Type git commit -m "Your concise commit message here"

    • Option 2: Type git commit This will open your default text editor and you can write a longer message

image.png

gitignore issue: If you look at the output above you will see that the two *:Zone.Identifier files were committed to the git repo. I should have staged and committed the .gitignore file before adding other files to the repo, or used the second option in step 2 and specified the files I wanted committed. To fix this I had to do the following two line of code:
git rm --cached *:Zone.Identifier
git commit -m "Remove Zone.Identifier files from tracking"

image.png

7. Git Commands#

Table 1: Initializing and Setting Up a Repository#

Command

Description

Common Flags/Switches

git init

Initializes a new Git repository in the current directory.

None

git clone <url>

Creates a local copy of a remote repository.

--depth <number>: Create a shallow clone with limited commit history.

git remote add <name> <url>

Adds a new remote repository with a specified name (e.g., origin).

None

git config

Configures user information (e.g., name, email) or repository settings.

--global: Applies to all repositories.


Table 2: Staging and Committing Changes#

Command

Description

Common Flags/Switches

git add <file>

Stages changes for a specific file.

.: Stages all changes in the current directory and its subdirectories.
-A: Stages all changes across the entire repository.
-p: Interactive staging.

git status

Shows the status of the working directory and staging area.

-s: Provides a short, compact status output.

git commit

Records staged changes to the repository.

-m "<message>": Adds a commit message inline.
--amend: Edits the most recent commit.

git log

Displays the commit history.

--oneline: Condenses output to one line per commit.
--graph: Displays a visual branch graph.


Table 3: Working with Branches#

Command

Description

Common Flags/Switches

git branch

Lists branches or creates a new branch.

-d <branch>: Deletes a branch.
-m <old-name> <new-name>: Renames a branch.

git checkout <branch>

Switches to an existing branch.

-b <branch>: Creates and switches to a new branch.

git switch

An alternative to checkout for switching branches.

-c <branch>: Creates a new branch and switches to it.

git merge <branch>

Merges the specified branch into the current branch.

None

git rebase <branch>

Moves the current branch to the tip of the specified branch, replaying commits.

--interactive: Interactively reorder, edit, or squash commits.


Table 4: Dealing with Remotes (Push, Pull, Fetch)#

Command

Description

Common Flags/Switches

git push

Uploads local changes to a remote repository.

--force: Overwrites remote history (use cautiously).
-u: Sets upstream branch for tracking.

git fetch

Downloads objects and refs from a remote repository without merging.

None

git pull

Combines fetch and merge to update the local branch with remote changes.

--rebase: Replays local commits on top of remote changes.

git remote

Manages remote repositories (e.g., list, add, remove).

None


Table 5: Resolving Merge Conflicts#

Command

Description

Common Flags/Switches

git merge <branch>

Attempts to merge the specified branch into the current branch.

None

git status

Displays files with conflicts after a failed merge.

None

git diff

Shows differences between the working directory and staging area, or between commits.

None

git add <file>

Marks a conflict as resolved for a specific file.

None

git merge --abort

Aborts the current merge and returns to the pre-merge state.

None

git mergetool

Opens a graphical or CLI tool to resolve merge conflicts interactively.

None


Table 6: Undoing Changes and Reverting#

Command

Description

Common Flags/Switches

git restore <file>

Discards changes in the working directory.

--staged: Unstages changes.
--source <commit>: Restores a file from a specific commit.

git reset

Moves the HEAD and optionally resets the staging area or working directory.

--soft: Keeps changes staged.
--mixed: Unstages changes.
--hard: Discards all changes.

git revert <commit>

Creates a new commit that undoes the changes in the specified commit.

None

git clean

Removes untracked files and directories.

-f: Forces removal.
-d: Removes untracked directories.

  • ‘git add` - there are a variety of git add commands

    • ‘git add .` - Recursively adds all files in current directory to staging area

    • git add -A - Recursively adds all files from git root to staging area

  • git log - provides a hash string to identify the commit, the author and date of the commit and the git commit message. You would use the hash string to revert back.

    image.png

  • git diff filename - gives the difference between the current file in the staging area and the last version committed

  • git checkout

    • git checkout -- filename Reverts a saved file in the staging area to the last commit

    • git checkout commit-hash - Reverts a saved file in the staging area to a commit identified by its hash in the log file

    • git checkout branch-name - Switches working directory to specified branch

    • git checkout -b new-branch-name - creates new branch and switches to it

    • git checkout . discards all unstaged changes in working directory

  • git rm --cached -r - stops tracking all files, be very careful when you use it

8. Special Files#

8.1 readme.md#

This is a markdown document that will function as the landing page for your github repository and display below your files.

8.2 .gitignore#

The .gitignore file is a list of files you do not want to add to the staging area. These are files that have nothing to do with your code. You add each file or file type to a new line. Types of files you would not want to track are:

  • Operating System Generated files like the ZoneIdentifier files.

  • Passwords, API and SSH keys

  • Data files

  • Utility Files

  • Executable programs .gitignore rules:

  • # indicate comments

  • * indicate wild cards

.gitignore templates#

github/gitignore is a Github has a repository of .gitignore templates for many programming languages. you can create a .gitignore file from scratch and add files to it. The following command will fectch the code and generate a .gitignore file int he directory you run it from, but will overwrite any existing code.

curl https://raw.githubusercontent.com/github/gitignore/main/Python.gitignore > .gitignore Once we have the file we can edit it with nano. image.png

8.3 environment.yml File#

YAML file that specifies the environment dependencies for a conda environment Specifies:

  • Conda packages

  • Pip packages

  • Channels to search for packages

Sample code: name: my_project_env channels:

  • conda-forge

  • defaults dependencies:

  • python=3.10

  • numpy

  • pandas

  • pip

  • pip:

    • some-pip-only-package

Note: the first -pip installs pip, and the second -pip: uses pip to install the subsequent packages below it.

9. Create and Upload to a GitHub Repo#

1. Create an account on github.com#

2. click on repositories and click NEW#

image.png

3. Provide a Name and Description#

Choose if it is Public or Private. As this is your first repo we will create an empty one and not add a .gitignore, readme or license. Instead we will create those on your local repo and push the to github. Scroll to the bottom and click “Create repository”, and you will now have a new online repo. Important, if you are new to coding you should not do any editing or create any files in the remote repo, as then you have to deal with conflicts between different versions.

image.png

5. Add .gitignore and readme file#

On your local machine go to the git repository you wish to upload to github. add your .gitignore and readme files. (See section 8 above). Be sure to have the .gitignore file or you will upload material that should not be uploaded to the github. If you are creating a new repository you will need to initialize it.

6. Connect local repo to remote github repo#

Now we need to tell the local repo that we created a remote repo using the command

git remote add origin <The SSH link on your repo>
git remote -v
git push -u origin main
Lets look at these commands one at a time.

  • git remote add origin <The SSH link on your repo> tells github to connect the current local repository to the github one using the SSH link. You will need to set up a token for authentification.

  • git remote -v Shows you what remote (github) repo you can push to or pull from

  • git push -u origin main In the future you will only need to run git push

    • git push tells git to push to the origin which is the remote name you created above

    • -u this stands for upstream, and tells git to remember the default remote and branch so in the future you can just use git push

    • origin This is the name you gave remote when you ran the git remote add above

    • main This is the branch of the remote you are pushing to, which is the main branch.

Check GitHub#

Now you can go to GitHub and check that your files have been uploaded

NOTE: If you ever need to find the URL to a remote repo, open the repo, click code, and it will show in the dropdown box. You will want to use the https option if you are cloning someone else’s repo, and the SSH option if you are setting up your own repo that you want to push to. image.png

Git Authentification Issues#

Any webservice that allows you to upload material like git will always be dealing with authentification. I am now finding it neccessary to use SSH. If you set up a repo and it asks for password, but fails, go and set up an SSH key.

  • Type git remote -v if it gives https addresses like below, you need to change them to .git addresses so you can SSH

image.png

  • Type git remote set-url origin git@github.com:YOURUSERNAME/REPONAME note, you can copy these from the clone option, just click on SSH instead of HTTPS

image.png

10. Cloning and Downloading From a Repo#

You sort of have three options to obtain material from a GitHub repo. You do not need to run Git on your computer to download a file or the zip, but you will need to run git to clone a repo.

  1. Download individual files

    • click on file and click download button image.png

  2. Download Zip of Repo

  • click on Repo. then Code, and Download Zip image.png

  1. Clone Repo

    • copy the https link for your repository

    • In the folder you wish to post your repository, which does not need to be a git folder, you should type the following command, and that will create a git folder with the content of the cloned folder

git clone https://github.com/rebelford/Py4Sci.git

11. Updating a Cloned Git Repo#

To pull updates from the original GitHub repository into your local repository, follow these steps:

1. Ensure You Are in the Correct Directory#

Navigate to the cloned repository on your computer:

cd /path/to/your/repository

Replace /path/to/your/repository with the actual path to your cloned repository.


2. Check the Remote Repository#

Verify the remote repository is correctly set up. Run:

git remote -v

This should display something like:

origin  https://github.com/username/repository.git (fetch)
origin  https://github.com/username/repository.git (push)

If the remote is not set or incorrect, set it using:

git remote add origin https://github.com/username/repository.git

3. Pull Updates in One Step#

Alternatively, you can combine fetch and merge using:

git pull origin main

after you have done this once, you can just type

git pull

This directly fetches and merges changes from the remote main branch.

—.

5. Steps to Overwrite Local Changes with Remote Files#

  1. Fetch Changes from the Remote: First, fetch the latest changes from the remote repository without merging them:

    git fetch origin
    
  2. Reset Your Local Branch to Match the Remote: Reset your local branch to match the remote branch (e.g., main). This will overwrite all tracked files in your local repository with the versions from the remote:

    git reset --hard origin/main
    
    • This step discards any uncommitted changes in tracked files.

    • Untracked files (files not added to Git) will remain untouched.

  3. Remove Untracked Files (Optional): If you want to clean up untracked files and directories, use:

    git clean -fd
    
    • Use this only if you’re sure you want to delete untracked files and directories.

  4. Pull the Latest Changes: After resetting, you can pull any additional updates from the remote (though this is often unnecessary immediately after a reset):

    git pull
    

6. Instead of step 3 you can Merge the Updates into Your Local Branch#

Assuming you are on the default main branch:

git merge origin/main

This merges the changes from the remote main branch into your local branch. If your repository uses a different default branch, replace main with the appropriate branch name (e.g., master).


7. Handle Merge Conflicts (if any)#

If there are conflicts, Git will notify you. Resolve conflicts manually by editing the affected files. After resolving:

  1. Mark the conflicts as resolved:

    git add <file>
    
  2. Complete the merge:

    git commit
    

Tips#

  • Always check the status of your local repository before pulling updates:

    git status
    
  • If you have local changes, stash them before pulling to avoid conflicts:

    git stash
    git pull origin main
    git stash pop
    
  1. git stash: Temporarily saves your current changes so your working directory is clean.

  2. git pull origin main: Fetches and integrates the latest changes from the remote main branch into your local branch.

  3. git stash pop: Retrieves and reapplies your stashed changes on top of the updated codebase.

12. JupyterLab Git Extension (optional)#

Do not be running a notebook or lab while installing the extension

  1. Turn off Jupyter Lab

  2. Make sure you are in the base environment

  3. Install the Jupyter Lab Extension

    • `conda install -c conda-forge jupyterlab-git

  4. Rebuild Jupyter Lab (if required)

    • jupyter lab build

Acknowledgements#

This content was developed with assistance from Perplexity AI and Chat GPT. Multiple queries were made during the Fall 2024 and the Spring 2025.