Exercise 6.3: Heavy-tailed distributions and outliers

Dataset download


David Prober’s lab studies sleep using zebrafish as a model organism. In a paper by Gandhi and coworkers his lab studied the effect of a deletion in the gene coding for arylalkylamine N-acetyltransferase (aanat), which is a key enzyme in the rhythmic production of melatonin. Melatonin is a hormone responsible for regulation of circadian rhythms. It is often taken as a drug to treat sleep disorders. The goal of this study was to investigate the effects of aanat deletion on sleep pattern in 5+ day old zebrafish larvae.

In one of the analyses, the authors compared the average activity of multiple fish over the course of night 6 of their lives. Here, activity is defined as the number of seconds per ten minutes that the fish is moving. You can download the results of the data processing here.

In performing exploratory data analysis, you will see that there are some clear outliers in activity. For at least two of these outliers, domain experts have told me that there were developmental problems with the fish.

a) Assuming that the activity of fish for each respective genotype is Normally distributed, obtain MLEs for the mean activity (parameter \(\mu\)) for each genotype with confidence intervals.

b) Normal models tend to fail when there are outliers. This is because of their very light tails. If you have an experiment that you suspect may have major deviations, it is often useful to choose a generative model with heavier tails. To that end, model the activity with a Student-t distribution. Obtain MLEs for the parameter \(\mu\) with error bars. Comment on how the inference changes using this model.