Lesson 33: More about the command line

This lesson was prepared in collaboration with Axel Müller and Shyam Saladi.


In this lesson, we will continue introducing techniques to navigate the command line. Like we have said before, you will feel empowered controlling your computer as you master command line skills. We will go through a set of commands and skills.

To start, let’s navigate into the folder we created in our first command line lesson. Fire up a terminal and cd into the directory ~/bootcamp/command_line_tutorial and let’s get going!

(Reminder: There may be differences between Windows Powershell and what we present here, which is for Linux and macOS. We will try to make note of and differences.)

man

Windows users: Powershell does not have man. You can instead use Get-help.

Most commands have a manual that can be accessed right from the terminal itself. Last time we explored the command more. Note, try this:

man more

The manual usually has a description of the command, a synopsis which informs you about the syntax, and a list of options.

tee

Windows users: In Powershell you can use Tee-object.

man tee will tell you:

tee - read from standard input and write to standard output and files.

Try this:

tee testing_tee.txt

start typing some text and press return. Repeat the process if you like. When you are done press ctrl-c. Have a look at what you just created (use, cat or less or more).

Pipes (|) and redirects (>)

Shells are very good at stringing commands together. Let’s look at an example:

ls | tee contents.txt

ls lists all the files and folders of the current directory and prints the information to the standard output. By adding the pipe character (|) we tell the shell to feed this information into the next command instead.

Compare this to

ls > contents_again.txt

The > character redirects from the standard output to a file. When using the tee command the same happens but the information is still passed to the standard output (meaning that it is displayed on your screen). When redirecting the information is just passed on to the file.

Note that redirecting with > to a file will overwrite the file’s original content. Using >> instead will append the output to the file.

ls >> contents_again.txt

echo

echo simply prints a line of text to the standard output:

echo "Thanks for all the fish!"

It can be quite useful in combination with redirects

echo "Thanks for all the fish" > hitchhiker_quotes.txt

grep

Windows users: The Powershell version of grep is Select-String.

grep searches the input for lines containing a match of a given expression. For example, to find the descriptor line in a FASTA file, you can do this.

grep ">" sequences/1z98.fasta

The > symbol is in quotes because we are telling the shell to interpret it literally, as opposed to a redirect.

If you want to find lines that do not have a >, use the -v flag with grep.

grep -v ">" sequences/1z98.fasta

If you want to ignore case, use the -i flag. For example,

grep "sequence" sequences/1z98.fasta

will not have any hits, but

grep -i "sequence" sequences/1z98.fasta

will.

If you provide a wildcard character * for the file name, grep will search all files in a directory. Remember that we had some PDB files in the ~git/bootcamp/data/ directory. We could find them (assuming we are in the ~/git/bootcamp/command_line_tutorial/ directory) by using

ls ../data/*.pdb

But if we are concerned that we might not have the right suffixes on all of our file names, we could use grep to get the name of all files that contain a string common to PDB files, like ATOM. To do this, we use the -l flag. We can combine that with the case-insensitivity (-i) flag.

grep -li "ATOM" ../data/*

grep is also very useful when combined with other commands. Try this one:

cat sequences/*.fasta | grep ">"

The word count command (wc) works particularly well with grep. Try:

cat sequences/*.fasta | grep ">" | wc -l

(man wc will tell you more about this useful little command.) Let’s looks at that command we just did in more detail. First, cat sequences/*.fasta outputs the entire text of all files in the directory sequences that have the .fasta suffix. That is piped to grep, meaning that the output of the cat command does not go to the screen, but to grep. So, we now take all that text from those files and use grep to give all lines that start with >. Those lines are then piped into wc, which, with the -l flag, gives the number of lines. Thus, we get a count of the total number of sequences in our FASTA files. Pretty slick!

^C, ^Z, bg, fg, ps, top, kill

There is a tiny python script called fibonacci.py in the command_line_tutorial directory. Have a look at it. It features an endless loop which is perfect for illustrating a number of commands. Start the program by typing:

python fibonacci.py

This will print Fibonacci numbers to the screen forever. Once you have enough you can terminate the script with:

^C

Let’s rerun the script and pipe the output to a file:

python fibonacci.py > fibs

This too runs forever. Instead of terminating the script, we can suspend it. To do this, type:

^Z

Next typing

bg

(for “background”) allows the process to be resumed in the background. To bring it back to the foreground, type

fg

and now we can terminate it again with

^C

Another way to kill a program is the kill command. For this we need to find out the process id. First, let’s start it up and put it in the background.

python fibonacci.py > fibs
^Z
bg

Now that it’s running in the background, we would like to know what process it is. Actually, we can find out all processes that are running. One way to do this is using the ps command.

ps

or:

ps -ax

Another useful command that shows you what is going on is:

top

Either command will reveal the process ID. Once you know the process ID,

kill <process ID>

(where you substitute <process ID> with the number you got from ps or top) will stop the process. Now, it might be a good idea to delete the fibs file.

Environment variables

An environment variable is a dynamic value that can affect the way running processes behave.”

The command:

env

(or ls env: for Powershell useres) will show you which environment variables have been set. One particularly important environment variable is PATH.

PATH

The PATH environment variable tells the system which directories the shell has to search for executable files. This is called the search path. It’s a list of absolute paths separated by colons.

To check what has been already added to PATH we can issue the following command:

echo "$PATH"

(Windows users use $env:path.) To see how this works, make sure you are in the ~/bootcamp/command_line_tutorial/ directory. There is a little shell script there called remind_me.sh. To run it, do the following.

./remind_me.sh

Now, try running it without the ./ at the beginning.

remind_me.sh

The second one does not work. This is because whenever you ask the shell to execute something, it searches the directories in PATH to find something with that name, unless you give the full path when invoking the executable. Note that ./remind_me.sh is the full path because ./ aliases to the working directory.

Now, we sometimes want the shell to find items in specific directories, so we can change the PATH environment variable.

One way to do this is to execute

export PATH=$PATH:/complete/path/to/be/included

Where /complete/path/to/be/included is the name of the directory you want to be added to the search path. This will be used for the remainder of the session. Once you close your terminal it’s gone. To make changes permanently we need to add this to the end of the relevent rc file (.bashrc for Bash and .zsh for Zsh).

conda

Conda is the package and environment manager that comes with your Anaconda installation. It allows you to install packages and makes sure that all dependencies are installed as well and it helps you keeping your installed packages up to date.

Have a look at this cheat sheet.

conda can also create environments in which specified packages are active. This is important because the requirements of one package may conflict with those of another. For example, say we want to create an environment that we will do other installations in, say related to Stan, a sophisticated package used in statistical applications.

conda create --name stan anaconda

Now, we want to switch to this package and install PyStan, for example. To activate the environment stan execute the following.

On Linux and Mac:

source activate stan

On Windows:

activate stan

Then, we can make our installations that are specific to this environment.

conda install pystan

If we want to switch back to our default environment, we do

conda deactivate

Conclusions

There is still much more to learn about using the command line effectively. However, given your basic command line knowledge and Python programming skills, you are already well on your way to being empowered for effectively use your computer as a research tool.