Lesson 18: More about the command line

This tutorial was generated from a Jupyter notebook. You can download the notebook here.

In this lesson, we will continue introducing command line skills. Like we have said before, you will feel empowered controlling your computer as you master command line skills. We will go through a set of commands and skills.

To start, let's navigate into the folder we created in our first command line lesson. We left it at ~/bootcamp/command_line_tutorial.

man

Most commands have a manual that can be accessed right from the terminal itself. Last time we explored the command more. Note, try this:

man more

The manual usually has a description of the command, a synopsis which informs you about the syntax, and a list of options.

Windows users: Git Bash does not have man. You can also look up man pages on the internet, e.g., here.

tee

man tee will tell you:

tee - read from standard input and write to standard output and files.

You will learn also about the authors, if you have some time google Richard M. Stallman. He's an interesting guy, and a very important person in the past and present of computational tools. The last time I remember him giving a talk at Caltech, you couldn't get into the lecture hall unless you were a half hour early.

Now, back to tee. Try this:

tee testing_tee.txt

start typing some text and press return. Repeat the process if you like. When you are done press ctrl-c. Have a look at what you just created (use, cat or less or more).

Pipes (|) and redirects (>)

Shells are very good at stringing commands together. Let's look at an example:

ls | tee contents.txt

ls lists all the files and folders of the current directory and prints the information to the standard output. By adding the pipe character (|) we tell the shell to feed this information into the next command instead.

Compare this to

ls > contents_again.txt

The > character redirects from the standard output to a file. When using the tee command the same happens but the information is still passed to the standard output. When redirecting the information is just passed on to the file. Note that redirecting with ">" to a file will overwrite the file's original content.

ls >> contents_again.txt 

Using ">>" will append the output to the file.

echo

echo simply prints a line of text to the standard output:

echo "Thanks for all the fish!"

It can be quite useful in combination with redirects

echo "Thanks for all the fish" > hitchhiker_quotes.txt

grep

We briefly encountered grep in the regex lesson. As a reminder, grep searches the input for lines containing a match of a given expression.

try:

grep ">" sequences/1z98.fasta

(Why is the > symbol in quotes?) Now try this:

grep -v ">" sequences/1z98.fasta

and this:

grep "Sequence" sequences/1z98.fasta

and this:

grep -i "Sequence" sequences/1z98.fasta

A very useful option is -li, try:

grep -li "ATOM" *

This prints all the filenames that have a matching line. Compare the output to this:

grep -li "ATOM" */*

grep is extremely useful when combined with other commands. Try this one:

cat sequences/*.fasta | grep ">"

The word count command (wc) works particularily well with grep. Try:

cat sequences/*.fasta | grep ">" | wc -l

(man wc will tell you more about this useful little command.) Let's looks at that command we just did in more detail. First, cat sequences/*.fasta outputs the entire text of all files in the directory sequences that have the .fasta suffix. That is piped to grep, meaning that the output of the cat command does not go to the screen, but to grep. So, we now take all that text from those files and use grep to give all lines that start with >. Those lines are then piped into wc, which, with the -l flag, gives the number of lines. Thus, we get a count of the total number of sequences in our FASTA files. Pretty slick!

^C, ^Z, bg, fg, ps, top, kill

There is a tiny python script called fibonacci.py Have a look at it. It features an endless loop which is perfect for illustrating a number of commands. Start the program by typing:

python fibonacci.py

This will print Fibonacci numbers to the screen forever. Once you have enough you can terminate the script with:

^C

Let's rerun the script and pipe the output to a file:

python fibonacci.py > fibs

This too runs forever. Instead of terminating the script, we can suspect it. To do this, type:

^Z

Next typing

bg

(for "background") allows the process to be resumed in the background. To bring it back to the foreground, type

fg

and now we can terminate it again with

^C

Another way to kill a program is the kill command. For this we need to find out the process id. First, let's start it up and put it in the background.

python fibonacci.py > fibs
^Z
bg

Now that it's running in the background, we would like to know what process it is. Actually, we can find out all processes that are running. One way to do this is using the ps command.

ps 

or:

ps -uaxel

Another useful command that shows you what is going on is:

top

Either command will reveal the process ID. Once you know the process ID (it will

kill process_id

will do exactly that. Now, it might be a good idea to delete the fib file. If you like, you can have a look at it first.

chmod

In order to run fibonacci.py we prefaced it with python. What happens if we just invoke it by itself?

./fibonacci.py

It tells us that we lack the permisson to execute the command. (Don't worry we talk about the ./ in a bit.)

ls -l

On the left we see the read(r), write(w), and execute(x) permissions for owner, group, and all users.

Enter:

chmod u+x 

and run

ls -l 

again. What changed? Have a look at man chmod and checkout chmod 777.

Now run:

./fibonacci.py

rc files

rc files contain commands that the corresponding application (or even the operating system) should run at startup, for example .YOUR_SHELLrc (in our case that would be .bashrc) can contain all kind of commands that make your life in the terminal easier. The suffix rc goes back to the early days and stands for "runcom" which is an abbreviation for "run commands".

rc-files are dot-files (configuration files) that are usually located in the home directory. The ls command usually ignores them. We need to use "ls -a" to list them.

.bashrc

Go to your home directory and execute:

ls -a

Do you see a .bashrc file? If yes, have a look what's in there.

This file can be used to customize your terminal in any conceivable way. In this tutorial we just look at one aspect of it.

(I recommend you checkout zsh as an alternative to bash. Together with Oh My ZSH (a community driven framework for managing ZSH configuration) it guarantees a fantastic terminal experience. JB and Axel both use it.

Environment variables

As Wikipedia says better than I can, "Environment variables are a set of dynamic values that can affect the way running processes will behave on a computer."

The command:

env

will show you which environment variables have been set. One particularly important environment variable is PATH.

PATH

The PATH environment variable tells the system which directories the shell has to search for executable files. This is called the search path. It's a list of absolute paths separated by colons. (Please note, PATH is different form a pathpath.)

To check what has been already added to PATH we can issue the following command:

echo "$PATH"

To see how this works, make sure you are in the ~/bootcamp/command_line_tutorial directory. There is a little shell script there called remind_me.sh. To run it, do the following.

./remind_me.sh

Now, try running it without the ./ at the beginning.

remind_me.sh

The second one does not work. This is because whenever you ask the shell to execute something, it searches the directories in PATH to find something with that name, unless you give the full path when invoking the executable. Note that ./remind_me.sh is the full path.

Now, we sometimes want the shell to find items in specific directories, so we can change the PATH environement variable.

One way to do this is to execute

export PATH=$PATH:/complete/path/to/be/included

Where /complete/path/to/be/included is the name of the directory you want to be added to the search path. This will be used for the remainder of the session. Once you close your terminal it's gone. To make changes permantely we need to add this to the end of the .bashrc file. Luckily we have all the tools to do this easily:

echo \`export PATH=$PATH:your/path/to/the/file\` >> ~/.bashrc

Once the .bashrc file is modified we need to source it (or open a new terminal) so that the changes can be applied.

source ~/.bashrc

Let's also have a look at the bashrc file:

tail ~/.bashrc

PYTHONPATH

While PATH tells the shell where to look for executables PYTHONPATH tells Python where to look for modules. It works the same way PATH works and it's also appended to the .bashrc file:

export PYTHONPATH=${PYTHONPATH}:/path/to/module/directory

For example, if you are storing your .py files in the directory ~/bootcamp/python_files, you would do

export PYTHONPATH=${PYTHONPATH}:$HOME/bootcamp/python_files

Then, the Python interpreter will always know where to look for your files (when it it launched from bash).

conda

Conda is the package and environment manager that comes with your Anaconda installation. It allows you to install packages and makes sure that all dependencies are installed as well and it helps you keeping your installed packages up to date.

Have a look at this cheat sheet.

conda can also create environments in which specified packages are active. This is important because the requirements of one package may conflict with those of another. For example: We are using Python 3 here, but a lot of packages only work with Python 2. So, by creating a Python 2 environment we make sure that there are no clashes.

Let's create a Python 2 environment:

conda create --name py2 python=2.7 anaconda

Once everything is installed you can have a look at all the environments at your disposal:

conda info -e

To activate the environment bootcamp execute the following.

On Linux and Mac:

source activate py2

On Windows:

activate py2