Lesson 0: Configuring your computer


In this lesson, you will set up a Python computing environment for scientific computing. There are two main ways people set up Python for scientific computing on their own machine.

  1. By downloading and installing package by package with tools like apt-get, pip, etc.

  2. By downloading and installing a Python distribution that contains binaries of many of the scientific packages needed. One widely used distribution is from Anaconda.

In this class, we will use Anaconda, with its associated package manager, conda. It has become the de facto package manager/distribution for scientific use.

macOS users: Install Command Line Tools

If you are using macOS, you should install Command Line Tools. This is a set of utilities, such as compilers and (most importantly for this bootcamp) Git. You could install it along with the very bloated software XCode, but we will not do that to save on the ≈35 GB of disk space that XCode will consume.

To install Command Line Tools, open the Terminal application. It is typically in the /Applications/Utilities folder. Otherwise, hit ⌘-space bar and type terminal in the search box, and select the Terminal Application. Once you have a prompt in the Terminal application, type

xcode-select --install

and hit enter. You will be prompted for the installation, and you should go ahead with it. It may take several minutes for the installation to complete.

Windows users: Install Git and Chrome or Firefox

We will be using JupyterLab in the workshop. It is browser-based, and Chrome, Firefox, and Safari are supported. Microsoft Edge is not. Therefore, if you are a Windows user, you need to be sure you have either Chrome of Firefox installed.

Git is installed on Macs with XCode. For Windows users, you need to install Git. You can do this by following the instructions here.

Downloading and installing Anaconda

If you already have Anaconda installed on your machine, you can skip this step.

Downloading and installing Anaconda is simple.

  1. Go to the Anaconda’s download page and click the Download button.

  2. Follow the on-screen instructions for installation. While doing so, be sure that Anaconda is installed in your home directory (which is the default), not in root.

That’s it! After you do that, you will have a functioning Python distribution.

Installing node.js

node.js is a platform that enables you to run JavaScript outside of the browser. We will not use it directly, but it needs to be installed for some of the more sophisticated JupyterLab functionality. Install node.js by following the instructions here.

Setting up a conda environment

I have created a conda environment for use in this workshop. You can download the YML specification for the environment:

You can set up and activate the environment on the command line or by using the Anaconda Navigator, which should be installed with Anaconda. You can do either of the two options, (a) or (b), below.

a) Activating from the command line

To set up your conda environment from the command line, navigate to the directory where you saved the pol_stats.yml file. Then, on the command line, enter

conda env create -f pol_stats.yml

This should build the environment for you (it may take several minutes). To then activate the environment, enter

conda activate pol_stats

on the command line.

b) Activating using the Anaconda Navigator

If you are using macOS, Anaconda Navigator will be available in your Applications menu. If you are using Windows, you can launch Anaconda Navigator from the Start menu.

When the Navigator window opens, select Environments on the left menu pane. Upon selecting Environments, you will see a pane immediately to the right of the Home/Environments/Learning/Community pane with a Search Environments window at the top. At the bottom of that pane, click Import. In the window that pops up, click on the folder icon under Local drive. Find the pol_stats.yml file you just downloaded. Click Import. It may take some time for the environment to be imported and built.

Optional: Install CmdStan

While we do not plan on using Stan in this workshop, we may play with some Bayesian modeling later in the week. To install Stan, you can do the following on the command line.

conda activate pol_stats
python -c "import cmdstanpy; cmdstanpy.install_cmdstan()"

This will take several minutes to run.

Launching JupyterLab and a terminal

We will be using JupyterLab throughout the workshop. You can alternatively launch JupyterLab via the Anaconda Navigator or via your operating system’s terminal program (Terminal on macOS and PowerShell on Windows). If you wish to launch using the latter, skip to the next section.

In the Anaconda Navigator, click Home on the left pane. To the right, you will have a pane from which you can launch JupyterLab. On the top of the right pane, you will see two pulldown menus separated by the word “on.” Be sure you select pol_stats on the right pulldown menu. This ensures that you are using the bootcamp environment you just set up.

You need to make sure you are using the pol_stats environment whenever you launch JupyterLab during the workshop.

You should see a card for JupyterLab. Do not confuse this with Notebook; you want to launch JupyterLab. Click Launch on the JupyterLab card. This will launch JupyterLab in your default browser.

Within the JupyterLab window in your browser, you will have the option to launch a notebook, a console, a terminal, or a text editor. We will use all of these during the workshop. For the updating and installation of necessary packages, click on Terminal to launch a terminal. You will get a terminal window (probably black) with a bash prompt. We refer to this text interface in the terminal as the command line. You can alternatively use Terminal for macOS or PowerShell for Windows for interfacing with the command line, but many students find it convenient to do so via JupyterLab.

Launching JupyterLab from the command line

While launching JupyterLab from the Anaconda Navigator is fine, I generally prefer to launch it from the command line on my own machine. If you are on a Mac, open the Terminal program. You can do this hitting Command + space bar and searching for “terminal.” Using Windows, you should launch PowerShell. You can do this by hitting Windows + R and typing powershell in the text box.

Once you have a terminal or PowerShell window open, you will have a prompt. At the prompt, type

conda activate pol_stats

This will ensure you are using the pol_stats environment you just created.

You need to make sure you are using the pol_stats environment whenever you launch JupyterLab during the workshop, so you should do conda activate pol_stats each time you open a terminal.

Now that you have activated the pol_stats environment, you can launch JupyterLab by typing

jupyter lab

on the command line. You will have an instance of JupyterLab running in your default browser. If you want to specify the browser, you can, for example, type

jupyter lab --browser=firefox

on the command line.

It is up to you if you want to launch JupyterLab from the Anaconda Navigator or command line.

Data sets

You should make sure you have all of the data sets we will use in the workshop downloaded to your machine. You can download all of the data sets from this link. To match the notes for the workshop, I advise the following directory structure.

pol-stats-workshop/
    data/
    lessons/
    exercises/

That way, when you are working on a lesson or exercise, the path to the data directory is always ../data/.

Checking your distribution

We’ll now run a quick test to make sure things are working properly. = We will make a quick plot that requires some of the scientific libraries we will use in the workshop.

Use the JupyterLab launcher (you can get a new launcher by clicking on the + icon on the left pane of your JupyterLab window) to launch a notebook. In the first cell (the box next to the [ ]: prompt), paste the code below. To run the code, press Shift+Enter while the cursor is active inside the cell. You should see a plot that looks like the one below. If you do, you have a functioning Python environment for scientific computing!

[1]:
import numpy as np
import bokeh.plotting
import bokeh.io

bokeh.io.output_notebook()

# Generate plotting values
t = np.linspace(0, 2 * np.pi, 200)
x = 16 * np.sin(t) ** 3
y = 13 * np.cos(t) - 5 * np.cos(2 * t) - 2 * np.cos(3 * t) - np.cos(4 * t)

p = bokeh.plotting.figure(height=250, width=275)
p.line(x, y, color="red", line_width=3)
text = bokeh.models.Label(x=0, y=0, text="Physics of Life", text_align="center")
p.add_layout(text)

bokeh.io.show(p)
Loading BokehJS ...

Computing environment

[2]:
%load_ext watermark
%watermark -v -p numpy,bokeh,jupyterlab
Python implementation: CPython
Python version       : 3.11.4
IPython version      : 8.12.2

numpy     : 1.24.3
bokeh     : 3.2.1
jupyterlab: 4.0.5