Exercise 10.5: Filter, extract, rinse, repeat

This exercise was written in collaboration with Griffin Chure.


So far we have seen that in a single (very clean) image, we can get somewhere around 20 - 30 well-separated cells in a single 100\(\times\) magnification phase contrast image. However, if you wish to report a mean fluorescence intensity for a single strain, you would certainly want more cells to have a good degree of confidence. Using the principles you learned above, your job will be to report a mean fluorescence value for the HG105 E. coli strain using all of the images located in data/HG105_images/. To do this, you should do the following:

  1. Get a list of all of the image files in ~/git/data/HG105_images/.

  2. Separate them by phase contrast (for segmentation) and FITC (for measurement).

  3. Iterate through each image file and perform segmentation and fluorescence intensity extraction for each cell. These values should be stored in a NumPy array or Pandas DataFrame.

  4. Plot an ECDF of all extracted fluorescence intensities and report a mean and standard deviation as well as the number of cells you successfully measured.

  5. Obtain 95% bootstrap confidence intervals for the mean and standard deviation of the fluorescence intensities.

As a reminder, the interpixel distance of these images is 0.0626 µm per pixel.

Solution


[1]:
import os
import glob

import numpy as np
import pandas as pd

import skimage.io
import skimage.morphology
import skimage.segmentation
import skimage.measure

import bokeh_catplot

import holoviews as hv
hv.extension('bokeh')

import bokeh.io
bokeh.io.output_notebook()
Loading BokehJS ...

We already wrote cell segmentation functions in Exercise 10.1 session. The call signature is

cell_segmenter(im, thresh='otsu', radius=20.0, image_mode='phase',
               area_bounds=(0, 1e7), ecc_bounds=(0, 1))

We will include that code here so it is available. In practice, you would have this code in a module that you would have installed on your machine.

[2]:
def cell_segmenter(
    im,
    thresh="otsu",
    radius=20.0,
    image_mode="phase",
    area_bounds=(0, 1e7),
    ecc_bounds=(0, 1),
):
    """
    This function segments a given image via thresholding and returns
    a labeled segmentation mask.

    Parameters
    ----------
    im : 2d-array
        Image to be segmented. This may be of either float or integer
        data type.
    thresh : int, float, or 'otsu'
        Value used during thresholding operation. This can either be a value
        (`int` or `float`) or 'otsu'. If 'otsu', the threshold value will be
        determined automatically using Otsu's thresholding method.
    radius : float
        Radius for gaussian blur for background subtraction. Default value
        is 20.
    image_mode : 'phase' or 'fluorescence'
        Mode of microscopy used to capture the image. If 'phase', objects with
        intensity values *lower* than the provided threshold will be selected.
        If `fluorescence`, values *greater* than the provided threshold will be
        selected. Default value is 'phase'.
    area_bounds : tuple of ints.
        Range of areas of acceptable objects. This should be provided in units
        of square pixels.
    ecc_bounds : tuple of floats
        Range of eccentricity values of acceptable objects. These values should
        range between 0.0 and 1.0.

    Returns
    -------
    im_labeled : 2d-array, int
        Labeled segmentation mask.
    """
    # Apply a median filter to remove hot pixels.
    med_selem = skimage.morphology.square(3)
    im_filt = skimage.filters.median(im, selem=med_selem)

    # Perform gaussian subtraction
    im_sub = bg_subtract(im_filt, radius)

    # Determine the thresholding method.
    if thresh is "otsu":
        thresh = skimage.filters.threshold_otsu(im_sub)

    # Determine the image mode and apply threshold.
    if image_mode is "phase":
        im_thresh = im_sub < thresh
    elif image_mode is "fluorescence":
        im_thresh = im_sub > thresh
    else:
        raise ValueError(
            "image mode not recognized. Must be 'phase'" + " or 'fluorescence'"
        )

    # Label the objects.
    im_label = skimage.measure.label(im_thresh)

    # Apply the area and eccentricity bounds.
    im_filt = area_ecc_filter(im_label, area_bounds, ecc_bounds)

    # Remove objects touching the border.
    im_border = skimage.segmentation.clear_border(im_filt, buffer_size=5)

    # Relabel the image.
    im_border = im_border > 0

    return skimage.measure.label(im_border)


def bg_subtract(im, radius):
    """
    Subtracts a gaussian blurred image from itself smoothing uneven
    illumination.

    Parameters
    ----------
    im : 2d-array
        Image to be subtracted
    radius : int or float
        Radius of gaussian blur

    Returns
    -------
    im_sub : 2d-array, float
        Background subtracted image.
    """
    # Apply the gaussian filter.
    im_filt = skimage.filters.gaussian(im, radius)

    # Ensure the original image is a float
    if np.max(im) > 1.0:
        im = skimage.img_as_float(im)

    return im - im_filt


def area_ecc_filter(im, area_bounds, ecc_bounds):
    """
    Filters objects in an image based on their areas.

    Parameters
    ----------
    im : 2d-array, int
        Labeled segmentation mask to be filtered.
    area_bounds : tuple of ints
        Range of areas in which acceptable objects exist. This should be
        provided in units of square pixels.
    ecc_bounds : tuple of floats
        Range of eccentricities in which acceptable objects exist. This should be
        provided on the range of 0 to 1.0.

    Returns
    -------
    im_relab : 2d-array, int
        The relabeled, filtered image.
    """

    # Extract the region props of the objects.
    props = skimage.measure.regionprops(im)

    # Extract the areas and labels.
    areas = np.array([prop.area for prop in props])
    eccs = np.array([prop.eccentricity for prop in props])
    labels = np.array([prop.label for prop in props])

    # Make an empty image to add the approved cells.
    im_approved = np.zeros_like(im)

    # Threshold the objects based on area and eccentricity
    for i, _ in enumerate(areas):
        if (
            areas[i] > area_bounds[0]
            and areas[i] < area_bounds[1]
            and eccs[i] > ecc_bounds[0]
            and eccs[i] < ecc_bounds[1]
        ):
            im_approved += im == labels[i]

    # Relabel the image.
    return skimage.measure.label(im_approved > 0)

Now we need to figure out some way in which we can iterate over all of our images in data/HG105_images. We have a few options here. First, and this is the worst solution, we could type out the name of each individual file and load them individually. The other option (a better option) is to make our computer get the list of all files in a single (or a few) lines of code. To do this, we will use the glob module. glob has a bunch of methods for obtaining filenames from directories. The most useful method of glob is glob which will return a list of paths that match a given pattern. Let’s see how well it works for our needs.

[3]:
# Glob the phase and fluo globs.
phase_glob = list(sorted(glob.glob('data/HG105_images/*phase*.tif')))
fluo_glob = list(sorted(glob.glob('data/HG105_images/*FITC*.tif')))

# Take a look.
print(phase_glob)
print(fluo_glob)
['data/HG105_images/noLac_phase_0000.tif', 'data/HG105_images/noLac_phase_0001.tif', 'data/HG105_images/noLac_phase_0002.tif', 'data/HG105_images/noLac_phase_0003.tif', 'data/HG105_images/noLac_phase_0004.tif', 'data/HG105_images/noLac_phase_0005.tif', 'data/HG105_images/noLac_phase_0006.tif', 'data/HG105_images/noLac_phase_0007.tif', 'data/HG105_images/noLac_phase_0008.tif']
['data/HG105_images/noLac_FITC_0000.tif', 'data/HG105_images/noLac_FITC_0001.tif', 'data/HG105_images/noLac_FITC_0002.tif', 'data/HG105_images/noLac_FITC_0003.tif', 'data/HG105_images/noLac_FITC_0004.tif', 'data/HG105_images/noLac_FITC_0005.tif', 'data/HG105_images/noLac_FITC_0006.tif', 'data/HG105_images/noLac_FITC_0007.tif', 'data/HG105_images/noLac_FITC_0008.tif']

Well, that was easy. We used the wildcard character (*) to find all files that had the pattern anything then the strings phase or FITC followed by anything so long that it had a .tif extension. The output is a little bit different as well in that it not only returned the name of the file in the specified data_dir, but it gave us the entire relative path. This is an important distinction and is actually one of the reasons I frequently use glob.glob.

Now that we have a list of files, we should iterate over each phase image, perform the segmentation, then extract the mean pixel intensities of each object and store them in a list.

[4]:
# Instantiate an empty list for the mean pixel intensity of each cell.
mean_ints = []

# Do the same for the areas.
areas = []

# Image index
image_numbers = []

# Define our area and eccentricity bounds for the segmentation function.
ip_dist = 0.0626  # in units of µm per pixel
area_bounds = (0.5 / ip_dist ** 2, 4 / ip_dist ** 2)
ecc_bounds = (0.8, 1.0)

# Loop through all images.
for i, pf in enumerate(zip(phase_glob, fluo_glob)):
    p, f = pf
    # Load the phase image.
    phase_im = skimage.io.imread(p)

    # Perform the segmentation.
    phase_seg = cell_segmenter(
        phase_im,
        image_mode="phase",
        area_bounds=area_bounds,
        ecc_bounds=ecc_bounds,
    )

    # Load the fluorescence image.
    fluo_im = skimage.io.imread(f)

    # Compute the region properties.
    props = skimage.measure.regionprops(phase_seg, intensity_image=fluo_im)

    # Add them to our storage lists.
    for prop in props:
        mean_ints.append(prop.mean_intensity)
        areas.append(prop.area * ip_dist ** 2)
        image_numbers.append(i)

# Units don't matter, so normalize integrated intensity
mean_ints = np.array(mean_ints)
mean_ints /= mean_ints.max()

# Store results in data frame
df = pd.DataFrame(
    {
        "mean intensity": mean_ints / max(mean_ints),
        "area": areas,
        "image number": image_numbers,
    }
)

So, in just a few seconds, were were able to do what would take hours (if not a full day) to do in ImageJ by clicking. To make sure things are working as expected, let’s take a look at the last segmentation mask generated. We can use the make_overlay() function from Exercise 10.2. Remember that the array phase_seg is the labeled image, so to get the binary image, we use phase_seg > 0.

[5]:
def make_overlay(im, im_bw):
    """Create RGB image with cyan channel saturated via segmentation mask."""
    # Make float and normalized image
    im = np.copy(im).astype(float)

    # Build RGB image by stacking grayscale images
    im_rgb = np.dstack(3 * [im / im.max()])

    # Saturate cyan channel wherever there are white pixels in thresh image
    im_rgb[im_bw, 0] = 1.0

    return im_rgb


im_rgb = make_overlay(phase_im, phase_seg > 0)

hv.RGB(im_rgb).opts(
    frame_width=300, frame_height=phase_im.shape[0] * 300 // phase_im.shape[0]
)
[5]:

That looks pretty great to me, let’s see how many cells we were actually able to measure.

[6]:
# Print the total number of cells.
print("Segmented and analyzed {num} cells!".format(num=len(df)))
Segmented and analyzed 92 cells!

Analyzing 92 cells in only 8 images in about 8 seconds is actually pretty good! That’s certainly enough to do some serious boot strapping. Let’s look at the ECDFs of the mean intensities and the areas.

[7]:
p_mean_int = bokeh_catplot.ecdf(
    data=df,
    cats=None,
    val='mean intensity',
    conf_int=True,
    height=200,
    width=300,
    x_axis_label='mean intensity (a.u.)'
)

p_area = bokeh_catplot.ecdf(
    data=df,
    cats=None,
    val='area',
    conf_int=True,
    height=200,
    width=300,
    x_axis_label='area (µm²)'
)

bokeh.io.show(
    bokeh.layouts.gridplot([p_mean_int, p_area], ncols=2)
)

The mean and standard deviation of the mean intensities (in arbitrary units) are directly computed.

[8]:
df['mean intensity'].agg(['mean', 'std'])
[8]:
mean    0.399901
std     0.101903
Name: mean intensity, dtype: float64

Let’s go ahead and perform 100,000 in silico replicates of this experiment to get a confidence interval on our mean and standard deviation of the mean fluorescence intensities.

[9]:
def draw_bs_reps(data, func, size=1):
    """Draw bootstrap replicates"""
    return np.array(
        [func(np.random.choice(data, size=len(data))) for _ in range(size)]
    )


# Draw bootstrap replicates of mean
bs_reps_mean = draw_bs_reps(mean_ints, func=np.mean, size=100000)

# Draw bootstrap replicates of standard deviation
bs_reps_std = draw_bs_reps(mean_ints, func=np.std, size=100000)

# Compute the 97.5% and 2.5% percentiles.
percs_mean = np.percentile(bs_reps_mean, [2.5, 97.5])
percs_std = np.percentile(bs_reps_std, [2.5, 97.5])

print(
    "95% of our bootstrapped means lie between {0:.2f} and {1:.2f}.".format(
        percs_mean[0], percs_mean[1]
    )
)
print(
    "95% of our bootstrapped st. devs. lie between {0:.2f} and {1:.2f}.".format(
        percs_std[0], percs_std[1]
    )
)
95% of our bootstrapped means lie between 0.38 and 0.42.
95% of our bootstrapped st. devs. lie between 0.06 and 0.14.

Computing environment

[10]:
%load_ext watermark
%watermark -v -p numpy,pandas,skimage,bokeh,bokeh_catplot,holoviews,jupyterlab
CPython 3.7.7
IPython 7.15.0

numpy 1.18.1
pandas 0.24.2
skimage 0.16.2
bokeh 2.1.0
bokeh_catplot 0.1.8
holoviews 1.13.2
jupyterlab 2.1.4