Lesson 27: Version control with Git

This lesson was generated from a Jupyter notebook. You can download the notebook here.

Keeping track of all of the changes in your project over time is good practice. How many times have you edited something in something you were writing and then wanted to go back and see what you had in the first place? Wouldn't it be great to know what changes you made and when you made them?

A version control system facilitates this process of keeping track of changes over time. Beyond that, it allows multiple people to collaborate and work on parts of the same project simultaneously.

There are many version control systems. The four most prominent, in order of age, are CVS, Subversion, Git, and Mercurial (the first version of Git was released about two weeks before Mercurial, so they are really the same age). Today, Git and Mercurial seem to dominate.

Git was developed by Linus Torvalds, the person who developed the Linux operating system. He named Linux after himself, and he decided to also name Git after himself ("git" is British slang for a stupid person). Try typing

man git

and read what the NAME of the software is.

Using Git as a version control system allows communication with remote repositories such as GitHub or Bitbucket. Both services provide univesity-affiliated people with a .edu email address with perks that include a number of private repositories (Github and Bitbucket.

Remote repositories are not only a great way for keeping your data safe. They are also an excellent tool for collaboration since Git allows multiple users to edit the shared files simultaneously and has a method to merge changes afterwards. Public repositories can also serve as a vehicle to distribute code (or other files).

Getting started with Git

You can find more information about Git here. It is well documented. These are excellent one-page cheatsheets (here and here).

Let's get started. You all should have a version of Git installed on your computers. Open the terminal and navigate into a directory of your choice. My choice would be ~/bootcamp/git_tutorial.

Setting up Git

Initialize your repository

Start by entering:

git init

This will create the directory .git You can check this with:

ls -a 

Configure Git

Since this is your first time you should configure Git:

git config --global user.name "YOUR NAME"
git config --global user.email "YOUR EMAIL ADDRESS"


Git requires a text editor, my personal preference is vi, but some people prefer emacs. (Both have similar capabilities and are extremely powerful. It would be a good idea if you spend some time getting to know one of them.)

git config --global core.editor vi

Both of these text editors are almost entirely keyboard-based and have some esoteric combinations of keystrokes to get things done. So,

Mac users: A simple text editor (but not powerful at all) is nano. You might want to choose this as your core.editor.

Windows users: If you are using Git Bash, vi is the only native editor. It has a notoriously steep learning curve, so if you find yourself trapped in vi in your terminal, ask a TA to help you.

Now, we can check to see the current configuration with:

git config --list



Git is very well documented and help is easily available. If you need to know more about config, for example, just enter:

git help config

File tracking

First, let's create some files that we can track:

echo "Hello World" > test.txt
echo "bla bla" > bla.txt

The process involves two steps.

  1. Add the files we want to track.
  2. Commit them to the repository.

Adding a file

There are different ways to add files, you can find out more with

git help add

But let's look at some of the options:

git add *

will add everything (other than dotfiles) in this directory and its subdirectories to be tracked. To add all files ending in .txt, we can do

git add *.txt

Finally,

git add test.txt

does what it says it does. Note that the first two options add more than we may want to, but they are convenient. If we want to be able to add a while bunch of files, but automatically ignore certain types of files, we can create a

.gitignore

file that contains a list of exceptions. For example:

echo "bla.txt" > .gitignore

Committing a file

Now we are ready to commit:

git commit -m "first commit"

The -m flag allows you to write a commit message that says what happened in this commit. Because Git tracks all commits, these comment strings are very useful for tracking what is changing from commit to commit and then later if you want to backtrack changes.

Committing modifications

Now we can modify our file:

echo "some changes" >> test.txt
git add test.txt
git commit -m "some changes added"

Logging commits

Git keeps track of all the commits we can have a look at the logfile:

git log

Here we see the commits with their messages. It makes your life much easier when you use descriptive commit messages.

Now we are able to see the differences between two commits:

git diff commit1 commit2

Going back

If we break our code we can just go back to the last commit where it was still working using git checkout.

git checkout commit1

It's a good idea to commit early and often, but don't commit half-done work (you can use git's stash feature to save these changes)! When working on an implementation, split features into multiple logical chunks, e.g. different functions, test them individually, and commit with descriptive messages in stages. This will help during the debugging process!

Cloning Repositories

On GitHub, you can find all kinds of public repositories. In this section, we will clone a simple package that will hurl Shakespearean insults at you. We can clone respositories using this syntax:

git clone some_repository target_directory


The target directory shouldn't exist prior to issuing this command. Let's clone the insulter, which is hosted at GitHub.

git clone https://gist.github.com/3165396.git insulter

Now, cd to insulter and you can start using it, thou wayward tickle-brained flap-dragon!.

python insulter.py

Another fun repo to clone

Now, we'll clone a repo that has a biology-inspired PacMan game written in Python.

git clone https://github.com/HussainAther/dnapacman pacman

Let's have a look:

cd pacman

Here we find some instructions:

less README.md

To run the program the program we require a Python2 environment. In Exercise 3, you set up a Python 2 environment. Activate it:

On Linux or Mac run:

source activate py2 

On Windows:

activate py2

And now, play some Bio-PacMan!

python pacman.py