Lesson 44: Introduction to image processing with scikit-image

[1]:

import numpy as np
import pandas as pd

# Our image processing tools
import skimage.filters
import skimage.io
import skimage.measure
import skimage.morphology

import colorcet

import iqplot
import bootcamp_utils

import holoviews as hv
hv.extension('bokeh')

import bokeh.io
bokeh.io.output_notebook()

Loading BokehJS ...

In this tutorial, we will learn some basic techniques for image processing using `scikit-image <http://scikit-image.org>`__ with Python.

Image processing tools for Python

There are many image processing tools available for Python. Some of them, such as ITK and OpenCV are mature image processing packages that have bindings for Python, allowing easy use of their functionality. Others were developed specifically for Python. Some of the many packages are

scikit-image
scipy.ndimage
Open CV (extensive computer vision package)
Cell Profiler (Broad Institute at MIT)
Insight Segmentation and Registration Toolkit (ITK, used in medical imaging, supported by the NIH)
Fiji and ImageJ support Jython scripting

The first two packages are standard with Anaconda. They provide a set of basic image processing tools, with more sophisticated packages such as ITK and Fiji supplying many more bells and whistles. If in the future you have more demanding image processing requirements, the other packages can prove very useful.

These days, there are lots of machine learning based packages for image segmentation, but few of these are mature packages at the moment. In future editions of the bootcamp, as these techniques and packages mature, we may use them.

We will almost exclusively use scikit-image along with the standard tools from NumPy. The package scipy.ndimage is quite useful, but we will use scikit-image, since it has expanded functionality. A potential annoyance with skimage is that the main package has minimal functionality, and you must import subpackages as needed. For example, to load and view images, you will need to import skimage.io. Importantly, skimage is well-documented, and you can access the documentation at http://scikit-image.org/.

We will explore skimage’s capabilities and some basic image processing techniques through example. In this lesson, we will take a brightfield and a fluorescent image of bacteria and perform segmentation, that is the identification of each pixel in the image as being bacterial or background.

Loading and viewing images

We will now load and view the test images we will use for segmentation. We load the image using the skimage.io.imread(). The image is stored as a NumPy array. Each entry in the array is a pixel value. This is an important point: a digital image is data! It is a set of numbers with spatial positions.

Today, we’ll be looking at some images of Bacillus subtilis, a gram-positive bacterium famous for its ability to enter a form of “suspended animation” known as sporulation when environmental conditions get rough. In these images, all cells have been engineered to express Cyan Fluorescent Protein (CFP) once they enter a particular genetic state known as competence. These cells have been imaged under phase contrast (bsub_100x_phase.tif) and epifluorescence (bsub_100x_cfp.tif) microscopy. These images were acquired by former Caltech graduate student (and 2016 bootcamp TA) Griffin Chure.

Let’s go ahead and load an image.

[2]:

# Load the phase contrast image.
im_phase = skimage.io.imread('data/bsub_100x_phase.tif')

# Take a look
im_phase

[2]:

array([[398, 403, 418, ..., 381, 377, 373],
       [410, 400, 398, ..., 385, 372, 395],
       [394, 407, 421, ..., 376, 377, 378],
       ...,
       [371, 382, 380, ..., 389, 380, 370],
       [362, 368, 356, ..., 397, 383, 382],
       [372, 364, 372, ..., 385, 371, 378]], dtype=uint16)

We indeed have a NumPy array of integer values. To properly display images, we also need to specify the interpixel distance, the physical distance corresponding to neighboring pixels in an image. Interpixel distances are calibrated for an optical setup by various means. For this particular setup, the interpixel distance was 62.6 nm.

[3]:

# Store the interpixel distance in units of microns
ip_distance = 0.0626

Now that we have the image loaded, and know the interpixel distance, we would like to view it. Really, I should say “plot it” because, an image is data.

Downsampling the image

In almost any scientific application, we would proceed using the loaded image. However, to reduce the file size of this notebook for display on the internet, it is useful (in fact necessary, given GitHub’s file size limits) to downsample the image. If you want to work through the notebook on your machine with full-sized images, adjust downsample to False in the code cell below.

[4]:

downsample = True

if downsample:
    im_phase = skimage.measure.block_reduce(im_phase, (2, 2), np.mean).astype(
        im_phase.dtype
    )
    ip_distance /= 2

Viewing images with Bokeh

To view images in Bokeh, we use the p.image() method. It expects a list of 2D arrays. The images in the list are overlaid. In practice, we usually only include a single image in the list.

Importantly, we can set the physical ranges of the x- and y-axes in the image display using the dw and dh kwargs, which respectively specify the width and height “data units” of the displayed image. For example, if the image is 50 by 100 microns, we can set dw = 100 and dh = 50.

Finally, there is a color_mapper kwarg, which sets how the image is colored. We will discuss this in more detail soon, but for now, we set the color mapper to be gray scale.

With this in mind, let’s show an image!

[5]:

# Get the physical scale of the image in terms of interpixel distance
dw = im_phase.shape[1] * ip_distance
dh = im_phase.shape[0] * ip_distance

# Set up figure, making sure x_range and y_range exactly capture image
p = bokeh.plotting.figure(
    frame_width=int(im_phase.shape[1] / 2),
    frame_height=int(im_phase.shape[0] / 2),
    x_axis_label="µm",
    y_axis_label="µm",
    x_range=[0, dw],
    y_range=[0, dh],
)

# Put in the image glyph; x and y set to origin
p.image(
    image=[im_phase],
    x=0,
    y=0,
    dw=dw,
    dh=dh,
    color_mapper=bokeh.models.LinearColorMapper(bokeh.palettes.gray(256)),
)

bokeh.io.show(p)

Display of images like this, though fairly straightforward, is automated in the bootcamp_utils.imshow() function. This has other bells and whistles, including allowance for showing multichannel images and showing colorbars indicating pixel intensities.

[6]:

p = bootcamp_utils.imshow(
    im_phase,
    interpixel_distance=ip_distance,
    length_units="µm",
    color_mapper=bokeh.models.LinearColorMapper(bokeh.palettes.grey(256)),
    colorbar=True,
    frame_height=300,
)

bokeh.io.show(p)

Conveniently, images rendered using Bokeh allow for interactivity with zooming, etc. They also retain the axis ticks and labels, rendering scale bars unnecessary.

Viewing images with HoloViews

The Image element of HoloViews enables easy viewing of images.

[7]:

hv.Image(
    data=im_phase
)

[7]:

The image is displayed with HoloView’s default colormap (more on that in a moment), and is roughly square by default. The axes also are scaled to be of unit length. We would rather have the axes marked in units of microns so we know the physical distances. We can specify that with the bounds kwarg for hv.Image, which consists of a list of numerical values to use for the [left, bottom, right, top] of the image. We can make that for this image using our value of the interpixel distance.

[8]:

height_pixels, width_pixels = im_phase.shape

bounds = [0, 0, width_pixels*ip_distance, height_pixels*ip_distance]

The representations of the pixels are also not square (they need not be; an image is data!), but aesthetically, we prefer that. To enforce that, we need to set the width and height of the display. We can set the height, and then the width of the figure is computed from the height to give approximately square pixels. We will specify a rather small image, since later on we will compare images side by side and we want room for that.

[9]:

frame_height = 200
frame_width = im_phase.shape[1] * frame_height // im_phase.shape[0]

Finally, we might want to look at the image with a grayscale colormap.

[10]:

hv.Image(
    data=im_phase,
    bounds=bounds,
).opts(
    frame_height=frame_height,
    frame_width=frame_width,
    cmap='gray',
    xlabel='µm',
    ylabel='µm',
)

[10]:

Lookup tables

In the above image representations, we used a gray colormap. Following are a few different colormaps we could use instead. As I discuss momentarily, you will almost always want to use Viridis.

[11]:

# Colormaps to check out, including a couple from colorcet
cmaps = [
    bokeh.palettes.gray(256),
    bokeh.palettes.viridis(256),
    colorcet.fire,
    colorcet.coolwarm,
]
cmap_names = ["gray", "Viridis", "fire", "coolwarm"]

# Build plots
plots = [
    bootcamp_utils.imshow(
        im_phase,
        interpixel_distance=ip_distance,
        length_units="µm",
        color_mapper=bokeh.models.LinearColorMapper(cmap),
        colorbar=True,
        frame_height=200,
        title=cmap_name,
    )
    for cmap_name, cmap in zip(cmap_names, cmaps)
]

bokeh.io.show(bokeh.layouts.gridplot(plots, ncols=2))

In image processing, a colormap is called a lookup table (LUT). A LUT is a mapping of pixel values to a color. This sometimes helps visualize images, especially when we use false coloring. Remember, a digital image is data, and false coloring an image is not manipulation of data. It is simply a different way of plotting it.

As we just saw, we specify a lookup table with a colormap. There is lots of debate about that the best colormaps (LUTs) are. The data visualization community seems to universally reject using rainbow colormaps. See, e.g., D. Borland and R. M. Taylor, Rainbow Color Map (Still) Considered Harmful, IEEE Computer Graphics and Applications, 27,14-17, 2007. In the lower right example, I use a hue-diverging colorscale, which goes from blue to red, as people accustomed to rainbow colormaps expect, but in a perceptually ordered fashion. Viridis has been designed to be perceptually flat across a large range of values.

Importantly, the false coloring helps use see that the intensity of the pixel values in the middle of cell clusters are similar to those of the background which will become an issue, as we will see, as we begin our segmentation.

Introductory segmentation

As mentioned before, segmentation is the process by which we separate regions of an image according to their identity for easier analysis. E.g., if we have an image of bacteria and we want to determine what is “bacteria” and what is “not bacteria,” we would do some segmentation. We will use bacterial test images for this purpose.

Histograms

As we begin segmentation, remember that viewing an image is just a way of plotting the digital image data. We can also plot a histogram. This helps use see some patterns in the pixel values and is often an important first step toward segmentation.

The histogram of an image is simply a list of counts of pixel values. When we plot the histogram, we can often readily see breaks in which pixel values are most frequently encountered. There are many ways of looking at histograms. The histogram() function of Bokeh-catplot can be used to conveniently display a histogram by using the bins='integer' kwarg. Importantly, the image needs to be flattened (converted to one dimension) to be used in the data frame required by Bokeh-catplot.

[12]:

p = iqplot.histogram(
    im_phase.flatten(),
    q='intensity',
    bins='integer'
)

bokeh.io.show(p)

We see that there are is some structure in the histogram of the phase image. While our eyes are drawn to the large peak around 380, we should keep in mind that our bacteria are black on a bright background and occupy only a small area of the image. We can see a smaller peak in the vicinity of 200 which likely represent our bugs of interest. The peak to the right is brighter, so likely represents the background. Therefore, if we can find where the valley between the two peaks is, we may take pixels with intensity below that value to be bacteria and those above to be background. Eyeballing it, I think this critical pixel value is about 300.

Thresholding

The process of taking pixels above or below a certain value is called thresholding. It is one of the simplest ways to segment an image. We call every pixel with a value below 300 part of a bacterium and everything above not part of a bacterium.

[13]:

# Threshold value, as obtained by eye
thresh_phase = 300

# Generate thresholded image
im_phase_bw = im_phase < thresh_phase

# Arguments for convenience
kwargs = dict(interpixel_distance=ip_distance,
        length_units="µm",
        frame_height=200)

# Display phase and thresholded image
plots = [
    bootcamp_utils.imshow(
        im_phase,
        color_mapper=bokeh.models.LinearColorMapper(bokeh.palettes.viridis(256)),
        **kwargs
    ),
    bootcamp_utils.imshow(
        im_phase_bw,
        color_mapper=bokeh.models.LinearColorMapper(bokeh.palettes.gray(2)),
        **kwargs
    ),
]

bokeh.io.show(bokeh.layouts.gridplot(plots, ncols=2))

We can overlay these images to get a good view. To do this, we will make an RGB image, and saturate the green channel where the thresholded image is white. We can then display it using bootcamp_utils.imshow().

[14]:

# Build RGB image by stacking grayscale images
im_phase_rgb = np.dstack(3 * [im_phase / im_phase.max()])

# Saturate green channel wherever there are white pixels in thresh image
im_phase_rgb[im_phase_bw, 1] = 1.0

# Show the result
bokeh.io.show(bootcamp_utils.imshow(im_phase_rgb, color_mapper="rgb", **kwargs))

We see that we did a decent job finding bacteria, but we also pick up quite a bit of garbage sitting around the cells. We can also see that in some of the bigger clusters, we do not effectively label the bacteria in the middle of colonies. This is because of the “halo” of high intensity signal near boundaries of the bacteria that we get from using phase contrast microscopy.

Using the CFP channel

One way around these issues is to use bacteria that constitutively express a fluorescent protein and to segment in using the fluorescent channel. Let’s try the same procedure with the CFP channel. First, let’s look at the image.

[15]:

# Load image
im_cfp = skimage.io.imread("data/bsub_100x_CFP.tif")

# Downsample for web
if downsample:
    im_cfp = skimage.measure.block_reduce(im_cfp, (2, 2), np.mean).astype(
        im_cfp.dtype
    )

# Display the image
bokeh.io.show(
    bootcamp_utils.imshow(
        im_cfp,
        color_mapper=bokeh.models.LinearColorMapper(bokeh.palettes.viridis(256)),
        colorbar=True,
        **kwargs
    )
)

We see that the bacteria are typically brighter than the background (which is impressively uniform), so this might help us in segmentation.

Filtering noise: the median filter

While it may not be obvious from this image, the non-bacterial pixels are not completely dark due to autofluorescence of the immobilization substrate as well as some issues in our camera. In fact, the camera on which these images were acquired has a handful of “bad” pixels which are always much higher than the “real” value. This could cause issues in situations where we would want to make quantitative measurements of intensity. We can zoom in on one of these “bad” pixels below (ignoring in the display below the axes).

[16]:

if downsample:
    slc = np.s_[75:125, 225:275]
else:
    slc = np.s_[150:250,450:550]

bokeh.io.show(
    bootcamp_utils.imshow(
        im_cfp[slc],
        color_mapper=bokeh.models.LinearColorMapper(bokeh.palettes.viridis(256)),
        **kwargs
    )
)

We see a single bright pixel. In addition to throwing off our colormap a bit, this could alter the measured intensity of a cell if there happen to be any other bad pixels hiding within the bacteria. We can remove this noise by using a median filter. The concept is simple. We take a shape of pixels, called a structuring element or footprint, and pass it over the image. The value of the center pixel in the max is replaced by the median value of all pixels in the mask. To do this, we first need to construct a mask. This is done using the skimage.morphology module. The filtering is then done using skimage.filters.rank.median(). Let’s try it with a 3$:nbsphinx-math:`times`$3 square mask.

[17]:

# Make the structuring element
selem = skimage.morphology.square(3)

# Perform the median filter
im_cfp_filt = skimage.filters.median(im_cfp, footprint=selem)

# Display image
bokeh.io.show(
    bootcamp_utils.imshow(
        im_cfp_filt,
        color_mapper=bokeh.models.LinearColorMapper(bokeh.palettes.viridis(256)),
        **kwargs
    )
)

Now that we have dealt with the noisy pixels, we can now see more clearly that some cells are very bright compared with others.

Thresholding in the CFP channel

We’ll proceed by plotting the histogram and finding the threshold value.

[18]:

p = iqplot.histogram(
    data=im_cfp_filt.flatten(),
    q='intensity',
    bins='integer',
    kind='step',
)

bokeh.io.show(p)

Yeesh. There are lots of bright pixels, but it is kind of hard to see where (or even if) there is valley in the histogram. It sometimes helps to plot the histogram with the y-axis on a log scale. When we do this, we can eyeball the threshold value to be about 140.

[19]:

p = iqplot.histogram(
    data=im_cfp_filt.flatten(),
    q='intensity',
    bins='integer',
    kind='step',
    y_axis_type='log',
    y_range=[1, 1e6],
)

bokeh.io.show(p)

Now let’s try thresholding the image.

[20]:

# Threshold value, as obtained by eye
thresh_cfp = 140

# Generate thresholded image
im_cfp_bw = im_cfp_filt > thresh_cfp

# Display
plots = [
    bootcamp_utils.imshow(
        im_cfp,
        color_mapper=bokeh.models.LinearColorMapper(bokeh.palettes.viridis(256)),
        **kwargs
    ),
    bootcamp_utils.imshow(
        im_cfp_bw,
        color_mapper=bokeh.models.LinearColorMapper(bokeh.palettes.gray(2)),
        **kwargs
    ),
]

bokeh.io.show(bokeh.layouts.gridplot(plots, ncols=2))

Looks like we’re doing much better! Let’s try overlapping the images now.

[21]:

# Build RGB image by stacking grayscale images
im_rgb = np.dstack(3 * [im_phase / im_phase.max()])

# Saturate green channel wherever there are white pixels in thresh image
im_rgb[im_cfp_bw, 1] = 1.0

# Show the result
bokeh.io.show(bootcamp_utils.imshow(im_rgb, color_mapper="rgb", **kwargs))

Very nice! In general, it is often much easier to segment bacteria with fluorescence.

Otsu’s method for thresholding

It turns out that there is an automated way to find the threshold value, as opposed to eyeballing it like we have been doing. Otsu’s method provides this functionality.

[22]:

# Compute Otsu thresholds for phase and cfp
thresh_phase_otsu = skimage.filters.threshold_otsu(im_phase)
thresh_cfp_otsu = skimage.filters.threshold_otsu(im_cfp_filt)

# Compare results to eyeballing it
print("Phase by eye: ", thresh_phase, "   CFP by eye: ", thresh_cfp)
print("Phase by Otsu:", thresh_phase_otsu, "   CFP by Otsu:", thresh_cfp_otsu)

Phase by eye:  300    CFP by eye:  140
Phase by Otsu: 436    CFP by Otsu: 134

We see that for the CFP channel, the Otsu method did very well. However, for phase, we see a big difference. This is because the Otsu method assumes a bimodal distribution of pixels. If we look at the histograms on a log scale, we see more clearly that the phase image has a long tail, which will trip up the Otsu algorithm. The moral of the story is that you can use automated thresholding, but you should always do sanity checks to make sure it is working as expected.

Determining the bacterial area

Now that we have a thresholded image, we can determine the total area taken up by bacteria. It’s as simple as summing up the pixel values of the thresholded image!

[23]:

# Compute bacterial area
bacterial_area_pix = (im_cfp_filt > thresh_cfp_otsu).sum()

# Print out the result
print('bacterial area =', bacterial_area_pix, 'pixels')

bacterial area = 9631 pixels

If we want to get the total area that is bacterial in units of µm, we could use the interpixel distances to get the area represented by each pixel. For this setup, the interpixel distance is 0.0636 µm. We can then compute the bacterial area as follows.

[24]:

# Compute bacterial area
bacterial_area_micron = bacterial_area_pix * ip_distance**2

# Print total area
print('bacterial area =', bacterial_area_micron, 'square microns')

bacterial area = 9.43539439 square microns

Computing environment

[25]:

%load_ext watermark
%watermark -v -p numpy,pandas,skimage,bokeh,holoviews,iqplot,bootcamp_utils,jupyterlab

Python implementation: CPython
Python version       : 3.9.12
IPython version      : 8.3.0

numpy         : 1.21.5
pandas        : 1.4.2
skimage       : 0.19.2
bokeh         : 2.4.2
holoviews     : 1.14.8
iqplot        : 0.2.5
bootcamp_utils: 0.0.7
jupyterlab    : 3.3.2