This lesson was generated from a Jupyter notebook. You can download the notebook here.
Keeping track of all of the changes in your project over time is good practice. How many times have you edited something in something you were writing and then wanted to go back and see what you had in the first place? Wouldn't it be great to know what changes you made and when you made them?
A version control system facilitates this process of keeping track of changes over time. Beyond that, it allows multiple people to collaborate and work on parts of the same project simultaneously.
There are many version control systems. The four most prominent, in order of age, are CVS, Subversion, Git, and Mercurial (the first version of Git was released about two weeks before Mercurial, so they are really the same age). Today, Git and Mercurial seem to dominate.
Git was developed by Linus Torvalds, the person who developed the Linux operating system. He named Linux after himself, and he decided to also name Git after himself ("git" is British slang for a stupid person). Try typing
man git
and read what the NAME
of the software is.
Using Git as a version control system allows communication with remote repositories such as GitHub or Bitbucket. Both services provide univesity-affiliated people with a .edu
email address with perks that include a number of private repositories (Github and Bitbucket.
Remote repositories are not only a great way for keeping your data safe. They are also an excellent tool for collaboration since Git allows multiple users to edit the shared files simultaneously and has a method to merge changes afterwards. Public repositories can also serve as a vehicle to distribute code (or other files).
You can find more information about Git here. It is well documented. These are excellent one-page cheatsheets (here and here).
Let's get started. You all should have a version of Git installed on your computers. Open the terminal and navigate into a directory of your choice. My choice would be ~/bootcamp/git_tutorial
.
Since this is your first time you should configure Git:
git config --global user.name "YOUR NAME"
git config --global user.email "YOUR EMAIL ADDRESS"
Git requires a text editor, my personal preference is vi
, but some people prefer emacs
. (Both have similar capabilities and are extremely powerful. It would be a good idea if you spend some time getting to know one of them.)
git config --global core.editor vi
Both of these text editors are almost entirely keyboard-based and have some esoteric combinations of keystrokes to get things done. So,
Mac users
: A simple text editor (but not powerful at all) is nano
. You might want to choose this as your core.editor
.
Windows users
: If you are using Git Bash, vi
is the only native editor. It has a notoriously steep learning curve, so if you find yourself trapped in vi
in your terminal, ask a TA to help you.
Now, we can check to see the current configuration with:
git config --list
Git is very well documented and help is easily available. If you need to know more about config
, for example, just enter:
git help config
First, let's create some files that we can track:
echo "Hello World" > test.txt
echo "bla bla" > bla.txt
The process involves two steps.
There are different ways to add files, you can find out more with
git help add
But let's look at some of the options:
git add *
will add everything (other than dotfiles) in this directory and its subdirectories to be tracked. To add all files ending in .txt
, we can do
git add *.txt
Finally,
git add test.txt
does what it says it does. Note that the first two options add more than we may want to, but they are convenient. If we want to be able to add a while bunch of files, but automatically ignore certain types of files, we can create a
.gitignore
file that contains a list of exceptions. For example:
echo "bla.txt" > .gitignore
Now we are ready to commit:
git commit -m "first commit"
The -m
flag allows you to write a commit message that says what happened in this commit. Because Git tracks all commits, these comment strings are very useful for tracking what is changing from commit to commit and then later if you want to backtrack changes.
Now we can modify our file:
echo "some changes" >> test.txt
git add test.txt
git commit -m "some changes added"
Git keeps track of all the commits we can have a look at the logfile:
git log
Here we see the commits with their messages. It makes your life much easier when you use descriptive commit messages.
Now we are able to see the differences between two commits:
git diff commit1 commit2
If we break our code we can just go back to the last commit where it was still working using git checkout
.
git checkout commit1
It's a good idea to commit early and often, but don't commit half-done work (you can use git's stash
feature to save these changes)! When working on an implementation, split features into multiple logical chunks, e.g. different functions, test them individually, and commit with descriptive messages in stages. This will help during the debugging process!
On GitHub, you can find all kinds of public repositories. In this section, we will clone a simple package that will hurl Shakespearean insults at you. We can clone respositories using this syntax:
git clone some_repository target_directory
The target directory shouldn't exist prior to issuing this command. Let's clone the insulter, which is hosted at GitHub.
git clone https://gist.github.com/3165396.git insulter
Now, cd
to insulter
and you can start using it, thou wayward tickle-brained flap-dragon!.
python insulter.py
Now, we'll clone a repo that has a biology-inspired PacMan game written in Python.
git clone https://github.com/HussainAther/dnapacman pacman
Let's have a look:
cd pacman
Here we find some instructions:
less README.md
To run the program the program we require a Python2 environment. In Exercise 3, you set up a Python 2 environment. Activate it:
On Linux or Mac run:
source activate py2
On Windows:
activate py2
And now, play some Bio-PacMan!
python pacman.py