In this lesson, we will continue introducing command line skills. Like we have said before, you will feel empowered controlling your computer as you master command line skills. We will go through a set of commands and skills.
To start, let's navigate into the folder we created in our first command line lesson. We left it at
Most commands have a manual that can be accessed right from the terminal itself. Last time we explored the command
more. Note, try this:
The manual usually has a description of the command, a synopsis which informs you about the syntax, and a list of options.
Windows users: Git Bash does not have
man. You can also look up
man pages on the internet, e.g., here.
man tee will tell you:
tee - read from standard input and write to standard output and files.
You will learn also about the authors, if you have some time google Richard M. Stallman. He's an interesting guy, and a very important person in the past and present of computational tools. The last time I remember him giving a talk at Caltech, you couldn't get into the lecture hall unless you were a half hour early.
Now, back to
tee. Try this:
start typing some text and press return. Repeat the process if you like. When you are done press
ctrl-c. Have a look at what you just created (use,
Shells are very good at stringing commands together. Let's look at an example:
ls | tee contents.txt
ls lists all the files and folders of the current directory and prints the information to the standard output. By adding the pipe character (
|) we tell the shell to feed this information into the next command instead.
Compare this to
ls > contents_again.txt
> character redirects from the standard output to a file. When using the tee command the same happens but the information is still passed to the standard output. When redirecting the information is just passed on to the file.
Note that redirecting with "
>" to a file will overwrite the file's original content.
ls >> contents_again.txt
>>" will append the output to the file.
echo simply prints a line of text to the standard output:
echo "Thanks for all the fish!"
It can be quite useful in combination with redirects
echo "Thanks for all the fish" > hitchhiker_quotes.txt
We briefly encountered
grep in the regex lesson. As a reminder,
grep searches the input for lines containing a match of a given expression.
grep ">" sequences/1z98.fasta
(Why is the
> symbol in quotes?) Now try this:
grep -v ">" sequences/1z98.fasta
grep "Sequence" sequences/1z98.fasta
grep -i "Sequence" sequences/1z98.fasta
A very useful option is
grep -li "ATOM" *
This prints all the filenames that have a matching line. Compare the output to this:
grep -li "ATOM" */*
grep is extremely useful when combined with other commands. Try this one:
cat sequences/*.fasta | grep ">"
The word count command (
wc) works particularily well with
cat sequences/*.fasta | grep ">" | wc -l
man wc will tell you more about this useful little command.) Let's looks at that command we just did in more detail. First,
cat sequences/*.fasta outputs the entire text of all files in the directory
sequences that have the
.fasta suffix. That is piped to
grep, meaning that the output of the
cat command does not go to the screen, but to
grep. So, we now take all that text from those files and use
grep to give all lines that start with
>. Those lines are then piped into
wc, which, with the
-l flag, gives the number of lines. Thus, we get a count of the total number of sequences in our FASTA files. Pretty slick!
There is a tiny python script called
fibonacci.py Have a look at it.
It features an endless loop which is perfect for illustrating a number of commands.
Start the program by typing:
This will print Fibonacci numbers to the screen forever. Once you have enough you can terminate the script with:
Let's rerun the script and pipe the output to a file:
python fibonacci.py > fibs
This too runs forever. Instead of terminating the script, we can suspect it. To do this, type:
(for "background") allows the process to be resumed in the background. To bring it back to the foreground, type
and now we can terminate it again with
Another way to kill a program is the kill command. For this we need to find out the process id. First, let's start it up and put it in the background.
python fibonacci.py > fibs
Now that it's running in the background, we would like to know what process it is. Actually, we can find out all processes that are running. One way to do this is using the
Another useful command that shows you what is going on is:
Either command will reveal the process ID. Once you know the process ID (it will
will do exactly that. Now, it might be a good idea to delete the fib file. If you like, you can have a look at it first.
In order to run fibonacci.py we prefaced it with python. What happens if we just invoke it by itself?
It tells us that we lack the permisson to execute the command. (Don't worry we talk about the ./ in a bit.)
On the left we see the read(r), write(w), and execute(x) permissions for owner, group, and all users.
again. What changed? Have a look at
man chmod and checkout
rc files contain commands that the corresponding application (or even the operating system) should run at startup, for example
.YOUR_SHELLrc (in our case that would be
.bashrc) can contain all kind of commands that make your life in the terminal easier. The suffix
rc goes back to the early days and stands for "runcom" which is an abbreviation for "run commands".
rc-files are dot-files (configuration files) that are usually located in the home directory. The
ls command usually ignores them. We need to use "
ls -a" to list them.
Go to your home directory and execute:
Do you see a
.bashrc file? If yes, have a look what's in there.
This file can be used to customize your terminal in any conceivable way. In this tutorial we just look at one aspect of it.
(I recommend you checkout
zsh as an alternative to
bash. Together with Oh My ZSH (a community driven framework for managing ZSH configuration) it guarantees a fantastic terminal experience. JB and Axel both use it.
As Wikipedia says better than I can, "Environment variables are a set of dynamic values that can affect the way running processes will behave on a computer."
will show you which environment variables have been set. One particularly important environment variable is
PATH environment variable tells the system which directories the shell has to search for executable files. This is called the search path. It's a list of absolute paths separated by colons. (Please note,
PATH is different form a path
To check what has been already added to PATH we can issue the following command:
To see how this works, make sure you are in the
~/bootcamp/command_line_tutorial directory. There is a little shell script there called
remind_me.sh. To run it, do the following.
Now, try running it without the
./ at the beginning.
The second one does not work. This is because whenever you ask the shell to execute something, it searches the directories in
PATH to find something with that name, unless you give the full path when invoking the executable. Note that
./remind_me.sh is the full path.
Now, we sometimes want the shell to find items in specific directories, so we can change the
PATH environement variable.
One way to do this is to execute
/complete/path/to/be/included is the name of the directory you want to be added to the search path. This will be used for the remainder of the session. Once you close your terminal it's gone. To make changes permantely we need to add this to the end of the
.bashrc file. Luckily we have all the tools to do this easily:
echo \`export PATH=$PATH:your/path/to/the/file\` >> ~/.bashrc
.bashrc file is modified we need to source it (or open a new terminal) so that the changes can be applied.
Let's also have a look at the bashrc file:
PATH tells the shell where to look for executables
PYTHONPATH tells Python where to look for modules. It works the same way
PATH works and it's also appended to the
For example, if you are storing your
.py files in the directory
~/bootcamp/python_files, you would do
Then, the Python interpreter will always know where to look for your files (when it it launched from bash).
Conda is the package and environment manager that comes with your Anaconda installation. It allows you to install packages and makes sure that all dependencies are installed as well and it helps you keeping your installed packages up to date.
Have a look at this cheat sheet.
conda can also create environments in which specified packages are active. This is important because the requirements of one package may conflict with those of another. For example: We are using Python 3 here, but a lot of packages only work with Python 2. So, by creating a Python 2 environment we make sure that there are no clashes.
Let's create a Python 2 environment:
conda create --name py2 python=2.7 anaconda
Once everything is installed you can have a look at all the environments at your disposal:
conda info -e
To activate the environment bootcamp execute the following.
On Linux and Mac:
source activate py2