{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Lesson 25: Survey of other packages and languages\n", "\n", "
" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", " \n", " Loading BokehJS ...\n", "
" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/javascript": [ "\n", "(function(root) {\n", " function now() {\n", " return new Date();\n", " }\n", "\n", " var force = true;\n", "\n", " if (typeof root._bokeh_onload_callbacks === \"undefined\" || force === true) {\n", " root._bokeh_onload_callbacks = [];\n", " root._bokeh_is_loading = undefined;\n", " }\n", "\n", " var JS_MIME_TYPE = 'application/javascript';\n", " var HTML_MIME_TYPE = 'text/html';\n", " var EXEC_MIME_TYPE = 'application/vnd.bokehjs_exec.v0+json';\n", " var CLASS_NAME = 'output_bokeh rendered_html';\n", "\n", " /**\n", " * Render data to the DOM node\n", " */\n", " function render(props, node) {\n", " var script = document.createElement(\"script\");\n", " node.appendChild(script);\n", " }\n", "\n", " /**\n", " * Handle when an output is cleared or removed\n", " */\n", " function handleClearOutput(event, handle) {\n", " var cell = handle.cell;\n", "\n", " var id = cell.output_area._bokeh_element_id;\n", " var server_id = cell.output_area._bokeh_server_id;\n", " // Clean up Bokeh references\n", " if (id != null && id in Bokeh.index) {\n", " Bokeh.index[id].model.document.clear();\n", " delete Bokeh.index[id];\n", " }\n", "\n", " if (server_id !== undefined) {\n", " // Clean up Bokeh references\n", " var cmd = \"from bokeh.io.state import curstate; print(curstate().uuid_to_server['\" + server_id + \"'].get_sessions()[0].document.roots[0]._id)\";\n", " cell.notebook.kernel.execute(cmd, {\n", " iopub: {\n", " output: function(msg) {\n", " var id = msg.content.text.trim();\n", " if (id in Bokeh.index) {\n", " Bokeh.index[id].model.document.clear();\n", " delete Bokeh.index[id];\n", " }\n", " }\n", " }\n", " });\n", " // Destroy server and session\n", " var cmd = \"import bokeh.io.notebook as ion; ion.destroy_server('\" + server_id + \"')\";\n", " cell.notebook.kernel.execute(cmd);\n", " }\n", " }\n", "\n", " /**\n", " * Handle when a new output is added\n", " */\n", " function handleAddOutput(event, handle) {\n", " var output_area = handle.output_area;\n", " var output = handle.output;\n", "\n", " // limit handleAddOutput to display_data with EXEC_MIME_TYPE content only\n", " if ((output.output_type != \"display_data\") || (!output.data.hasOwnProperty(EXEC_MIME_TYPE))) {\n", " return\n", " }\n", "\n", " var toinsert = output_area.element.find(\".\" + CLASS_NAME.split(' ')[0]);\n", "\n", " if (output.metadata[EXEC_MIME_TYPE][\"id\"] !== undefined) {\n", " toinsert[toinsert.length - 1].firstChild.textContent = output.data[JS_MIME_TYPE];\n", " // store reference to embed id on output_area\n", " output_area._bokeh_element_id = output.metadata[EXEC_MIME_TYPE][\"id\"];\n", " }\n", " if (output.metadata[EXEC_MIME_TYPE][\"server_id\"] !== undefined) {\n", " var bk_div = document.createElement(\"div\");\n", " bk_div.innerHTML = output.data[HTML_MIME_TYPE];\n", " var script_attrs = bk_div.children[0].attributes;\n", " for (var i = 0; i < script_attrs.length; i++) {\n", " toinsert[toinsert.length - 1].firstChild.setAttribute(script_attrs[i].name, script_attrs[i].value);\n", " toinsert[toinsert.length - 1].firstChild.textContent = bk_div.children[0].textContent\n", " }\n", " // store reference to server id on output_area\n", " output_area._bokeh_server_id = output.metadata[EXEC_MIME_TYPE][\"server_id\"];\n", " }\n", " }\n", "\n", " function register_renderer(events, OutputArea) {\n", "\n", " function append_mime(data, metadata, element) {\n", " // create a DOM node to render to\n", " var toinsert = this.create_output_subarea(\n", " metadata,\n", " CLASS_NAME,\n", " EXEC_MIME_TYPE\n", " );\n", " this.keyboard_manager.register_events(toinsert);\n", " // Render to node\n", " var props = {data: data, metadata: metadata[EXEC_MIME_TYPE]};\n", " render(props, toinsert[toinsert.length - 1]);\n", " element.append(toinsert);\n", " return toinsert\n", " }\n", "\n", " /* Handle when an output is cleared or removed */\n", " events.on('clear_output.CodeCell', handleClearOutput);\n", " events.on('delete.Cell', handleClearOutput);\n", "\n", " /* Handle when a new output is added */\n", " events.on('output_added.OutputArea', handleAddOutput);\n", "\n", " /**\n", " * Register the mime type and append_mime function with output_area\n", " */\n", " OutputArea.prototype.register_mime_type(EXEC_MIME_TYPE, append_mime, {\n", " /* Is output safe? */\n", " safe: true,\n", " /* Index of renderer in `output_area.display_order` */\n", " index: 0\n", " });\n", " }\n", "\n", " // register the mime type if in Jupyter Notebook environment and previously unregistered\n", " if (root.Jupyter !== undefined) {\n", " var events = require('base/js/events');\n", " var OutputArea = require('notebook/js/outputarea').OutputArea;\n", "\n", " if (OutputArea.prototype.mime_types().indexOf(EXEC_MIME_TYPE) == -1) {\n", " register_renderer(events, OutputArea);\n", " }\n", " }\n", "\n", " \n", " if (typeof (root._bokeh_timeout) === \"undefined\" || force === true) {\n", " root._bokeh_timeout = Date.now() + 5000;\n", " root._bokeh_failed_load = false;\n", " }\n", "\n", " var NB_LOAD_WARNING = {'data': {'text/html':\n", " \"
\\n\"+\n", " \"

\\n\"+\n", " \"BokehJS does not appear to have successfully loaded. If loading BokehJS from CDN, this \\n\"+\n", " \"may be due to a slow or bad network connection. Possible fixes:\\n\"+\n", " \"

\\n\"+\n", " \"\\n\"+\n", " \"\\n\"+\n", " \"from bokeh.resources import INLINE\\n\"+\n", " \"output_notebook(resources=INLINE)\\n\"+\n", " \"\\n\"+\n", " \"
\"}};\n", "\n", " function display_loaded() {\n", " var el = document.getElementById(\"1001\");\n", " if (el != null) {\n", " el.textContent = \"BokehJS is loading...\";\n", " }\n", " if (root.Bokeh !== undefined) {\n", " if (el != null) {\n", " el.textContent = \"BokehJS \" + root.Bokeh.version + \" successfully loaded.\";\n", " }\n", " } else if (Date.now() < root._bokeh_timeout) {\n", " setTimeout(display_loaded, 100)\n", " }\n", " }\n", "\n", "\n", " function run_callbacks() {\n", " try {\n", " root._bokeh_onload_callbacks.forEach(function(callback) {\n", " if (callback != null)\n", " callback();\n", " });\n", " } finally {\n", " delete root._bokeh_onload_callbacks\n", " }\n", " console.debug(\"Bokeh: all callbacks have finished\");\n", " }\n", "\n", " function load_libs(css_urls, js_urls, callback) {\n", " if (css_urls == null) css_urls = [];\n", " if (js_urls == null) js_urls = [];\n", "\n", " root._bokeh_onload_callbacks.push(callback);\n", " if (root._bokeh_is_loading > 0) {\n", " console.debug(\"Bokeh: BokehJS is being loaded, scheduling callback at\", now());\n", " return null;\n", " }\n", " if (js_urls == null || js_urls.length === 0) {\n", " run_callbacks();\n", " return null;\n", " }\n", " console.debug(\"Bokeh: BokehJS not loaded, scheduling load and callback at\", now());\n", " root._bokeh_is_loading = css_urls.length + js_urls.length;\n", "\n", " function on_load() {\n", " root._bokeh_is_loading--;\n", " if (root._bokeh_is_loading === 0) {\n", " console.debug(\"Bokeh: all BokehJS libraries/stylesheets loaded\");\n", " run_callbacks()\n", " }\n", " }\n", "\n", " function on_error() {\n", " console.error(\"failed to load \" + url);\n", " }\n", "\n", " for (var i = 0; i < css_urls.length; i++) {\n", " var url = css_urls[i];\n", " const element = document.createElement(\"link\");\n", " element.onload = on_load;\n", " element.onerror = on_error;\n", " element.rel = \"stylesheet\";\n", " element.type = \"text/css\";\n", " element.href = url;\n", " console.debug(\"Bokeh: injecting link tag for BokehJS stylesheet: \", url);\n", " document.body.appendChild(element);\n", " }\n", "\n", " const hashes = {\"https://cdn.bokeh.org/bokeh/release/bokeh-2.1.0.min.js\": \"YobFyzPeVUsFQydHkJGsJL1kyfHnWxOlPc3EwaV22TmBaeGoXHLWx5aRRVPS9xlE\", \"https://cdn.bokeh.org/bokeh/release/bokeh-widgets-2.1.0.min.js\": \"NuAg9+TcTQQqvQCTtkCneRrpkTiMhhfiq0KHiBzx8ECiKiLWXHN6i6ia3q7b3eHu\", \"https://cdn.bokeh.org/bokeh/release/bokeh-tables-2.1.0.min.js\": \"uMVqQc8JqHitD67bXTn9a06Mrk3EiHRaZ18EJENQenAKJ/KL71SakdXYomZQpGRr\", \"https://cdn.bokeh.org/bokeh/release/bokeh-gl-2.1.0.min.js\": \"u+eGuEXC8aw0VSCm2mH+b/tQEAitUOYiR1H6SuIVEdUmXsf4vN8m/SmXpmjb7U/X\"};\n", "\n", " for (var i = 0; i < js_urls.length; i++) {\n", " var url = js_urls[i];\n", " var element = document.createElement('script');\n", " element.onload = on_load;\n", " element.onerror = on_error;\n", " element.async = false;\n", " element.src = url;\n", " if (url in hashes) {\n", " element.crossOrigin = \"anonymous\";\n", " element.integrity = \"sha384-\" + hashes[url];\n", " }\n", " console.debug(\"Bokeh: injecting script tag for BokehJS library: \", url);\n", " document.head.appendChild(element);\n", " }\n", " };\n", "\n", " function inject_raw_css(css) {\n", " const element = document.createElement(\"style\");\n", " element.appendChild(document.createTextNode(css));\n", " document.body.appendChild(element);\n", " }\n", "\n", " \n", " var js_urls = [\"https://cdn.bokeh.org/bokeh/release/bokeh-2.1.0.min.js\", \"https://cdn.bokeh.org/bokeh/release/bokeh-widgets-2.1.0.min.js\", \"https://cdn.bokeh.org/bokeh/release/bokeh-tables-2.1.0.min.js\", \"https://cdn.bokeh.org/bokeh/release/bokeh-gl-2.1.0.min.js\"];\n", " var css_urls = [];\n", " \n", "\n", " var inline_js = [\n", " function(Bokeh) {\n", " Bokeh.set_log_level(\"info\");\n", " },\n", " function(Bokeh) {\n", " \n", " \n", " }\n", " ];\n", "\n", " function run_inline_js() {\n", " \n", " if (root.Bokeh !== undefined || force === true) {\n", " \n", " for (var i = 0; i < inline_js.length; i++) {\n", " inline_js[i].call(root, root.Bokeh);\n", " }\n", " if (force === true) {\n", " display_loaded();\n", " }} else if (Date.now() < root._bokeh_timeout) {\n", " setTimeout(run_inline_js, 100);\n", " } else if (!root._bokeh_failed_load) {\n", " console.log(\"Bokeh: BokehJS failed to load within specified timeout.\");\n", " root._bokeh_failed_load = true;\n", " } else if (force !== true) {\n", " var cell = $(document.getElementById(\"1001\")).parents('.cell').data().cell;\n", " cell.output_area.append_execute_result(NB_LOAD_WARNING)\n", " }\n", "\n", " }\n", "\n", " if (root._bokeh_is_loading === 0) {\n", " console.debug(\"Bokeh: BokehJS loaded, going straight to plotting\");\n", " run_inline_js();\n", " } else {\n", " load_libs(css_urls, js_urls, function() {\n", " console.debug(\"Bokeh: BokehJS plotting callback run at\", now());\n", " run_inline_js();\n", " });\n", " }\n", "}(window));" ], "application/vnd.bokehjs_load.v0+json": "\n(function(root) {\n function now() {\n return new Date();\n }\n\n var force = true;\n\n if (typeof root._bokeh_onload_callbacks === \"undefined\" || force === true) {\n root._bokeh_onload_callbacks = [];\n root._bokeh_is_loading = undefined;\n }\n\n \n\n \n if (typeof (root._bokeh_timeout) === \"undefined\" || force === true) {\n root._bokeh_timeout = Date.now() + 5000;\n root._bokeh_failed_load = false;\n }\n\n var NB_LOAD_WARNING = {'data': {'text/html':\n \"
\\n\"+\n \"

\\n\"+\n \"BokehJS does not appear to have successfully loaded. If loading BokehJS from CDN, this \\n\"+\n \"may be due to a slow or bad network connection. Possible fixes:\\n\"+\n \"

\\n\"+\n \"\\n\"+\n \"\\n\"+\n \"from bokeh.resources import INLINE\\n\"+\n \"output_notebook(resources=INLINE)\\n\"+\n \"\\n\"+\n \"
\"}};\n\n function display_loaded() {\n var el = document.getElementById(\"1001\");\n if (el != null) {\n el.textContent = \"BokehJS is loading...\";\n }\n if (root.Bokeh !== undefined) {\n if (el != null) {\n el.textContent = \"BokehJS \" + root.Bokeh.version + \" successfully loaded.\";\n }\n } else if (Date.now() < root._bokeh_timeout) {\n setTimeout(display_loaded, 100)\n }\n }\n\n\n function run_callbacks() {\n try {\n root._bokeh_onload_callbacks.forEach(function(callback) {\n if (callback != null)\n callback();\n });\n } finally {\n delete root._bokeh_onload_callbacks\n }\n console.debug(\"Bokeh: all callbacks have finished\");\n }\n\n function load_libs(css_urls, js_urls, callback) {\n if (css_urls == null) css_urls = [];\n if (js_urls == null) js_urls = [];\n\n root._bokeh_onload_callbacks.push(callback);\n if (root._bokeh_is_loading > 0) {\n console.debug(\"Bokeh: BokehJS is being loaded, scheduling callback at\", now());\n return null;\n }\n if (js_urls == null || js_urls.length === 0) {\n run_callbacks();\n return null;\n }\n console.debug(\"Bokeh: BokehJS not loaded, scheduling load and callback at\", now());\n root._bokeh_is_loading = css_urls.length + js_urls.length;\n\n function on_load() {\n root._bokeh_is_loading--;\n if (root._bokeh_is_loading === 0) {\n console.debug(\"Bokeh: all BokehJS libraries/stylesheets loaded\");\n run_callbacks()\n }\n }\n\n function on_error() {\n console.error(\"failed to load \" + url);\n }\n\n for (var i = 0; i < css_urls.length; i++) {\n var url = css_urls[i];\n const element = document.createElement(\"link\");\n element.onload = on_load;\n element.onerror = on_error;\n element.rel = \"stylesheet\";\n element.type = \"text/css\";\n element.href = url;\n console.debug(\"Bokeh: injecting link tag for BokehJS stylesheet: \", url);\n document.body.appendChild(element);\n }\n\n const hashes = {\"https://cdn.bokeh.org/bokeh/release/bokeh-2.1.0.min.js\": \"YobFyzPeVUsFQydHkJGsJL1kyfHnWxOlPc3EwaV22TmBaeGoXHLWx5aRRVPS9xlE\", \"https://cdn.bokeh.org/bokeh/release/bokeh-widgets-2.1.0.min.js\": \"NuAg9+TcTQQqvQCTtkCneRrpkTiMhhfiq0KHiBzx8ECiKiLWXHN6i6ia3q7b3eHu\", \"https://cdn.bokeh.org/bokeh/release/bokeh-tables-2.1.0.min.js\": \"uMVqQc8JqHitD67bXTn9a06Mrk3EiHRaZ18EJENQenAKJ/KL71SakdXYomZQpGRr\", \"https://cdn.bokeh.org/bokeh/release/bokeh-gl-2.1.0.min.js\": \"u+eGuEXC8aw0VSCm2mH+b/tQEAitUOYiR1H6SuIVEdUmXsf4vN8m/SmXpmjb7U/X\"};\n\n for (var i = 0; i < js_urls.length; i++) {\n var url = js_urls[i];\n var element = document.createElement('script');\n element.onload = on_load;\n element.onerror = on_error;\n element.async = false;\n element.src = url;\n if (url in hashes) {\n element.crossOrigin = \"anonymous\";\n element.integrity = \"sha384-\" + hashes[url];\n }\n console.debug(\"Bokeh: injecting script tag for BokehJS library: \", url);\n document.head.appendChild(element);\n }\n };\n\n function inject_raw_css(css) {\n const element = document.createElement(\"style\");\n element.appendChild(document.createTextNode(css));\n document.body.appendChild(element);\n }\n\n \n var js_urls = [\"https://cdn.bokeh.org/bokeh/release/bokeh-2.1.0.min.js\", \"https://cdn.bokeh.org/bokeh/release/bokeh-widgets-2.1.0.min.js\", \"https://cdn.bokeh.org/bokeh/release/bokeh-tables-2.1.0.min.js\", \"https://cdn.bokeh.org/bokeh/release/bokeh-gl-2.1.0.min.js\"];\n var css_urls = [];\n \n\n var inline_js = [\n function(Bokeh) {\n Bokeh.set_log_level(\"info\");\n },\n function(Bokeh) {\n \n \n }\n ];\n\n function run_inline_js() {\n \n if (root.Bokeh !== undefined || force === true) {\n \n for (var i = 0; i < inline_js.length; i++) {\n inline_js[i].call(root, root.Bokeh);\n }\n if (force === true) {\n display_loaded();\n }} else if (Date.now() < root._bokeh_timeout) {\n setTimeout(run_inline_js, 100);\n } else if (!root._bokeh_failed_load) {\n console.log(\"Bokeh: BokehJS failed to load within specified timeout.\");\n root._bokeh_failed_load = true;\n } else if (force !== true) {\n var cell = $(document.getElementById(\"1001\")).parents('.cell').data().cell;\n cell.output_area.append_execute_result(NB_LOAD_WARNING)\n }\n\n }\n\n if (root._bokeh_is_loading === 0) {\n console.debug(\"Bokeh: BokehJS loaded, going straight to plotting\");\n run_inline_js();\n } else {\n load_libs(css_urls, js_urls, function() {\n console.debug(\"Bokeh: BokehJS plotting callback run at\", now());\n run_inline_js();\n });\n }\n}(window));" }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import numpy as np\n", "import numba\n", "\n", "import bokeh.plotting\n", "import bokeh.io\n", "import bokeh.models\n", "\n", "bokeh.io.output_notebook()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "In this bootcamp, we have used Python as the language of instruction. Because Python is an extendable language, it affords us to use domain specific packages. We have use Numpy for numerical computations, SciPy for special functions, statistics, and other scientific applications, Pandas for handling data sets, Bokeh for low-level plotting, HoloViews for high-level plotting, and Panel for dashboards.\n", "\n", "There are **plenty** of other Python-based packages that can be useful in computing in the biological sciences, and hopefully you will write (and share) some of your own for your applications. In this lesson, we will review some other Python packages you may find useful in your work. We will also discuss other languages that you might employ for scientific computing." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Other useful Python packages\n", "\n", "There are countless useful Python packages for scientific computing. Here, I am highlighting just a few. Actually, I am highlighting only ones I have come across and used in my own work. There are many, many more very high quality packages out there fore various domain specific applications that I am not covering here." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Data science\n", "\n", "#### Dask\n", "\n", "[Dask](https://dask.org) allows for out-of-core computation with large data structures. For example, if your data set is too large to fit in RAM, thereby precluding you from using a Pandas data frame, you can use a Dask data frame, which will handle the out-of-core computing for you, and your data type will look an awful lot like a Pandas data frame. It also handles parallelization of calculations on large data sets.\n", "\n", "\n", "#### xarray\n", "\n", "[xarray](https://xarray.pydata.org/) extends the concepts of Pandas data frames to more dimensions. It is convenient for organizing, accessing, and computing with more complex data structures." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Plotting\n", "\n", "We have used Bokeh and HoloViews for plotting. As we say in our [lesson on plotting](l19_plotting.ipynb), the landscape for Python plotting libraries is large. Here, I discuss a few other packages I have used.\n", "\n", "#### Altair\n", "\n", "[Altair](https://altair-viz.github.io) is a very nice plotting package that generates plots using [Vega-Lite](https://vega.github.io/vega-lite/). It is high level and declarative. The plots are rendered using JavaScript and have some interactivity.\n", "\n", "#### Matplotlib\n", "\n", "[Matplotlib](https://matplotlib.org) is really *the* main plotting library for Python. It is the most fully featured and most widely used. It has some high-level functionality, but is primarily a lower level library for building highly customizable graphics.\n", "\n", "#### Seaborn\n", "\n", "[Seaborn](https://seaborn.pydata.org) is a high-level statistical plotting package build on top of Matplotlib. I find its grammar clean and accessible; you can quickly make beautiful, informative graphics with it." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Bioinformatics\n", "\n", "#### Bioconda\n", "\n", "[Bioconda](https://bioconda.github.io) is not a Python package, but is a channel for the conda package manager that has many (7000+) bioinformatics packages. Most of these packages are not available through the default conda channel. This allows use of conda to keep all of your bioinformatics packages installed and organized.\n", "\n", "#### Biopython\n", "\n", "[Biopython](https://biopython.org) is a widely used package for parsing bioinformatics files of various flavors, managing sequence alignments, etc.\n", "\n", "#### scikit-bio\n", "\n", "[scikit-bio](http://scikit-bio.org) has similar functionality as Biopython, but also includes some algorithms as well, for example for alignment and making phylogenetic trees." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Image processing\n", "\n", "#### scikit-image\n", "\n", "We haven't covered image processing in the main portion of the bootcamp, but it is discussed in the auxiliary lessons. The main package we use in the bootcamp lessons is [scikit-image](https://scikit-image.org), which has many classic image processing operations included.\n", "\n", "\n", "#### DeepCell\n", "\n", "These days, the state-of-the-art image segmentation tools use deep learning methods. [DeepCell](https://www.deepcell.org) is developed at Caltech in the [Van Valen lab](http://www.vanvalen.caltech.edu), and is an excellent cell segmentation tool." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Machine learning\n", "\n", "Python is widely used in machine learning applications, largely because it so easily wraps compiled code written in C or C++.\n", "\n", "#### scikit-learn\n", "\n", "[scikit-learn](https://scikit-learn.org/) is a widely used machine learning package for Python that does many standard machine learning tasks such as classification, clustering, dimensionality reduction, etc.\n", "\n", "#### TensorFlow\n", "\n", "[TensorFlow](https://www.tensorflow.org) is an extensive library for computation in machine learning developed by Google. It is especially effective for deep learning. It has a Python API.\n", "\n", "#### Keras\n", "\n", "In practice, you might rarely use TensorFlow's core functionality, but rather use [Keras](https://keras.io) to build deep learning models. Keras has an intuitive API and allows you to rapidly get up and running with deep learning.\n", "\n", "\n", "#### PyTorch\n", "\n", "[PyTorch](https://pytorch.org) is a library similar to TensorFlow." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Statistics\n", "\n", "In addition to the scipy.stats package, there are many packages for statistical analysis in the Python ecosystem.\n", "\n", "#### statsmodels\n", "\n", "[statsmodels](https://www.statsmodels.org/) has extensive functionality for computing hypothesis tests, kernel density estimation, regression, time series analysis, and much more.\n", "\n", "#### PyMC3\n", "\n", "[PyMC3](https://docs.pymc.io) is a probabilistic programming package primarily used for performing Markov chain Monte Carlo. It relies on [Theano](http://deeplearning.net/software/theano/), which is no longer actively developed. [PyMC4](https://github.com/pymc-devs/pymc4) will use TensorFlow, but this will result in a new API.\n", "\n", "#### Stan/PyStan/CmdStanPy\n", "\n", "[Stan](https://mc-stan.org) is a probabilistic programming language that uses state-of-the-art algorithms for Markov chain Monte Carlo and Bayesian inference. It is its own language, and you can access Stan models through two Python interfaces, [PyStan](https://pystan.readthedocs.io) and [CmdStanPy](https://cmdstanpy.readthedocs.io). I prefer to use the latter, which is a much more lightweight interface.\n", "\n", "#### ArviZ\n", "\n", "[ArviZ](https://arviz-devs.github.io/arviz/) is a wonderful packages that generates output of various Bayesian inference packages in a unified format using [xarray](https://xarray.pydata.org/). Using ArviZ, you can use whatever MCMC package you like, and your downstream analysis will always use the same syntax." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Pyserial\n", "\n", "[pySerial](https://pythonhosted.org/pyserial/) is a useful package for communication with external devices using a serial port. If you are designing your own instruments for research and wish to control them with your computer via Python, you will almost certainly use this package." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Numba\n", "\n", "[Numba](https://numba.pydata.org) is a Python package for [just-in-time compilation](https://en.wikipedia.org/wiki/Just-in-time_compilation). The result is often greatly accelerated Python code, even beyond what Numpy can provide. It particularly excels when you have loops in your Python code. As an example, let's consider taking a one-dimensional random walk. Here is a Python function to do that." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "def randwalk(n_steps):\n", " steps = np.array([1, -1])\n", "\n", " position = np.empty(n_steps+1, dtype=np.int64)\n", " \n", " position[0] = 0\n", " for i in range(n_steps):\n", " position[i+1] = position[i] + np.random.choice(steps)\n", " \n", " return position" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can use the `%timeit` magic function to see how long it takes to compute a random walk of 100,000 steps." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "782 ms ± 15.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n" ] } ], "source": [ "%timeit randwalk(100000)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It took close to one second on my machine to take the walk. We will not decorate the function with a `@numba.njit` decorator, which tells Numba to compile the function." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "@numba.njit\n", "def randwalk(n_steps):\n", " steps = np.array([1, -1])\n", "\n", " position = np.empty(n_steps+1, dtype=np.int64)\n", " \n", " position[0] = 0\n", " for i in range(n_steps):\n", " position[i+1] = position[i] + np.random.choice(steps)\n", " \n", " return position" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, let's time this one. Before we time it, though, we should run it once to do the compilation, so the compilation time is not included in the timing." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1.8 ms ± 25.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n" ] } ], "source": [ "randwalk(100000)\n", "\n", "%timeit randwalk(100000)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is a speedup of about a factor of 500, simply by adding the decorator!\n", "\n", "Of course, there is a more clever way to do the random walk that will be even faster. (Inspect this function below to see how it is doing the same thing as the random walk in the above function. You might want to look up the documentation for the `np.cumsum()` function.)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "775 µs ± 14.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n" ] } ], "source": [ "def randwalk(n_steps):\n", " return np.concatenate(\n", " ((0,), np.cumsum(np.random.choice([1, -1], size=100000)))\n", " )\n", "\n", "%timeit randwalk(100000)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this case, we found a clever way around looping and could greatly speed up the calculation. But in the event that you do not have such an option, Numba can really add speed!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Other languages\n", "\n", "We now turn to a survey other other computing languages." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Compiled languages\n", "\n", "When you write code in a compiled language, the code you write is first translated, or **compiled** into a set of machine instructions that can be directly run by your machine's CPU. Compile languages tend to be more verbose including requiring more direct instructions about how to allocate and free memory. They tend to be more low-level; you need more lines of code to do the same task compared to **dynamic languages** like Python, in which the code is interpreted (translated) into machine code as you run, and the interpreter handles a lot of the memory allocation and deallocation automatically behind the scenes.\n", "\n", "While it often takes longer to develop code in compiled languages, they typically have much better speed because they require less overhead, and you are in a sense closer to the CPU. Pretty much any major numerical calculation is done with compiled code. Numpy is in many ways a Python wrapper around highly optimized compiled C libraries.\n", "\n", "#### Fortran\n", "\n", "[Fortran](https://en.wikipedia.org/wiki/Fortran) was one of the first compiled language, first developed in 1956. As a result, it was actively developed for decades and has very good performance. Furthermore, huge Fortran code bases exist that have been reliably used and tested for decades. For this reason, Fortran is still widely used, particularly in physics, astronomy, and atmospheric science.\n", "\n", "#### C\n", "\n", "[C](https://en.wikipedia.org/wiki/C_(programming_language)) (along with C++) is probably the most widely used compiled language across the sciences and elsewhere. In fact, the Python interpreter that we have been using this whole bootcamp is written in C. \n", "\n", "#### C++\n", "\n", "[C++](https://en.wikipedia.org/wiki/C%2B%2B) is very much like C, except it is more feature rich, enabling object-oriented programming. Many bioinformatics algorithms are written in C++, though many also provide high-level interfaces in interpreted languages like R or Python.\n", "\n", "#### Java\n", "\n", "Nearly as widely used as C and C++, [Java](https://en.wikipedia.org/wiki/Java_(programming_language)) is a more modern compiled language. Unlike Fortran and C/C++, Java is compiled into bytecode, which is like machine code, but more portable. The bytecode is just-in-time compiled into machine code at runtime. Java is used in many bioinformatics applications. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Dynamic languages\n", "\n", "#### Python\n", "\n", "I think we've covered this one.\n", "\n", "\n", "#### Ruby\n", "\n", "Ruby is a high-level interpreted language that has fairly widespread use, particularly in web applications. In particular, [Jekyll](https://jekyllrb.com) allows for rapid design of beautiful websites." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### JavaScript\n", "\n", "JavaScript is a core language for the web. Importantly, it allows for dynamic features in browser-based applications. Because of its central importance in this regard, major companies have spent substantial resources in developing very effective just-in-time compilers for it. Browsers have become highly optimized for running JavaScript code, resulting in excellent performance. In recent times, JavaScript has been adopted as a programming language for more substantial computation, including in the sciences. Due to the ability to create rich interactive graphics, it has also been adapted for use in data science.\n", "\n", "From what we have learned here in the bootcamp, if you know some JavaScript, you can make really cool interactive graphics that will run natively in the browser without the need for a Python engine running behind them. As an example, below is a Bokeh plot that is interactive in the static HTML version of this notebook." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "
\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/javascript": [ "(function(root) {\n", " function embed_document(root) {\n", " \n", " var docs_json = {\"81e4edf3-40b3-4a1c-b122-f5b9dae34fca\":{\"roots\":{\"references\":[{\"attributes\":{\"children\":[{\"id\":\"1002\"},{\"id\":\"1003\"}]},\"id\":\"1028\",\"type\":\"Column\"},{\"attributes\":{\"axis\":{\"id\":\"1012\"},\"ticker\":null},\"id\":\"1015\",\"type\":\"Grid\"},{\"attributes\":{},\"id\":\"1004\",\"type\":\"Range1d\"},{\"attributes\":{\"active_drag\":\"auto\",\"active_inspect\":\"auto\",\"active_multi\":null,\"active_scroll\":\"auto\",\"active_tap\":\"auto\"},\"id\":\"1020\",\"type\":\"Toolbar\"},{\"attributes\":{},\"id\":\"1013\",\"type\":\"BasicTicker\"},{\"attributes\":{\"source\":{\"id\":\"1021\"}},\"id\":\"1026\",\"type\":\"CDSView\"},{\"attributes\":{},\"id\":\"1006\",\"type\":\"DataRange1d\"},{\"attributes\":{},\"id\":\"1010\",\"type\":\"LinearScale\"},{\"attributes\":{\"line_alpha\":0.1,\"line_color\":\"#1f77b4\",\"line_width\":2,\"x\":{\"field\":\"x\"},\"y\":{\"field\":\"y\"}},\"id\":\"1024\",\"type\":\"Line\"},{\"attributes\":{\"data_source\":{\"id\":\"1021\"},\"glyph\":{\"id\":\"1023\"},\"hover_glyph\":null,\"muted_glyph\":null,\"nonselection_glyph\":{\"id\":\"1024\"},\"selection_glyph\":null,\"view\":{\"id\":\"1026\"}},\"id\":\"1025\",\"type\":\"GlyphRenderer\"},{\"attributes\":{},\"id\":\"1008\",\"type\":\"LinearScale\"},{\"attributes\":{},\"id\":\"1036\",\"type\":\"UnionRenderers\"},{\"attributes\":{\"end\":10,\"format\":\"0[.]00\",\"js_property_callbacks\":{\"change:value\":[{\"id\":\"1027\"}]},\"start\":1,\"step\":0.1,\"title\":\"frequency\",\"value\":1},\"id\":\"1002\",\"type\":\"Slider\"},{\"attributes\":{\"line_color\":\"#1f77b4\",\"line_width\":2,\"x\":{\"field\":\"x\"},\"y\":{\"field\":\"y\"}},\"id\":\"1023\",\"type\":\"Line\"},{\"attributes\":{},\"id\":\"1033\",\"type\":\"BasicTickFormatter\"},{\"attributes\":{\"below\":[{\"id\":\"1012\"}],\"center\":[{\"id\":\"1015\"},{\"id\":\"1019\"}],\"frame_height\":100,\"frame_width\":300,\"left\":[{\"id\":\"1016\"}],\"renderers\":[{\"id\":\"1025\"}],\"title\":{\"id\":\"1030\"},\"toolbar\":{\"id\":\"1020\"},\"x_range\":{\"id\":\"1004\"},\"x_scale\":{\"id\":\"1008\"},\"y_range\":{\"id\":\"1006\"},\"y_scale\":{\"id\":\"1010\"}},\"id\":\"1003\",\"subtype\":\"Figure\",\"type\":\"Plot\"},{\"attributes\":{\"formatter\":{\"id\":\"1031\"},\"ticker\":{\"id\":\"1017\"}},\"id\":\"1016\",\"type\":\"LinearAxis\"},{\"attributes\":{},\"id\":\"1031\",\"type\":\"BasicTickFormatter\"},{\"attributes\":{\"data\":{\"x\":{\"__ndarray__\":\"AAAAAAAAAACASAEiBYhkP4BIASIFiHQ/wOwBswfMfj+ASAEiBYiEP6CagWoGqok/wOwBswfMjj9wH8F9BPeRP4BIASIFiJQ/kHFBxgUZlz+gmoFqBqqZP7DDwQ4HO5w/wOwBswfMnj/oCqErhK6gP3AfwX0E96E/+DPhz4Q/oz+ASAEiBYikPwhdIXSF0KU/kHFBxgUZpz8YhmEYhmGoP6CagWoGqqk/KK+hvIbyqj+ww8EOBzusPzjY4WCHg60/wOwBswfMrj+kAJECRAqwP+gKoSuErrA/LBWxVMRSsT9wH8F9BPexP7Qp0aZEm7I/+DPhz4Q/sz88PvH4xOOzP4BIASIFiLQ/xFIRS0UstT8IXSF0hdC1P0xnMZ3FdLY/kHFBxgUZtz/Ue1HvRb23PxiGYRiGYbg/XJBxQcYFuT+gmoFqBqq5P+SkkZNGTro/KK+hvIbyuj9subHlxpa7P7DDwQ4HO7w/9M3RN0ffvD842OFgh4O9P3zi8YnHJ74/wOwBswfMvj8E9xHcR3C/P6QAkQJECsA/xgUZF2RcwD/oCqErhK7APwoQKUCkAME/LBWxVMRSwT9OGjlp5KTBP3AfwX0E98E/kiRJkiRJwj+0KdGmRJvCP9YuWbtk7cI/+DPhz4Q/wz8aOWnkpJHDPzw+8fjE48M/XkN5DeU1xD+ASAEiBYjEP6JNiTYl2sQ/xFIRS0UsxT/mV5lfZX7FPwhdIXSF0MU/KmKpiKUixj9MZzGdxXTGP25subHlxsY/kHFBxgUZxz+ydsnaJWvHP9R7Ue9Fvcc/9oDZA2YPyD8YhmEYhmHIPzqL6Syms8g/XJBxQcYFyT9+lflV5lfJP6CagWoGqsk/wp8Jfyb8yT/kpJGTRk7KPwaqGahmoMo/KK+hvIbyyj9KtCnRpkTLP2y5seXGlss/jr45+uboyz+ww8EOBzvMP9LISSMnjcw/9M3RN0ffzD8W01lMZzHNPzjY4WCHg80/Wt1pdafVzT984vGJxyfOP57neZ7nec4/wOwBswfMzj/i8YnHJx7PPwT3EdxHcM8/JvyZ8GfCzz+kAJECRArQPzUD1QxUM9A/xgUZF2Rc0D9XCF0hdIXQP+gKoSuErtA/eQ3lNZTX0D8KEClApADRP5sSbUq0KdE/LBWxVMRS0T+9F/Ve1HvRP04aOWnkpNE/3xx9c/TN0T9wH8F9BPfRPwEiBYgUINI/kiRJkiRJ0j8jJ42cNHLSP7Qp0aZEm9I/RSwVsVTE0j/WLlm7ZO3SP2cxncV0FtM/+DPhz4Q/0z+JNiXalGjTPxo5aeSkkdM/qzut7rS60z88PvH4xOPTP81ANQPVDNQ/XkN5DeU11D/vRb0X9V7UP4BIASIFiNQ/EUtFLBWx1D+iTYk2JdrUPzNQzUA1A9U/xFIRS0Us1T9VVVVVVVXVP+ZXmV9lftU/d1rdaXWn1T8IXSF0hdDVP5lfZX6V+dU/KmKpiKUi1j+7ZO2StUvWP0xnMZ3FdNY/3Wl1p9Wd1j9ubLmx5cbWP/9u/bv179Y/kHFBxgUZ1z8hdIXQFULXP7J2ydola9c/Q3kN5TWU1z/Ue1HvRb3XP2V+lflV5tc/9oDZA2YP2D+Hgx0OdjjYPxiGYRiGYdg/qYilIpaK2D86i+ksprPYP8uNLTe23Ng/XJBxQcYF2T/tkrVL1i7ZP36V+VXmV9k/D5g9YPaA2T+gmoFqBqrZPzGdxXQW09k/wp8Jfyb82T9Tok2JNiXaP+SkkZNGTto/dafVnVZ32j8GqhmoZqDaP5esXbJ2ydo/KK+hvIby2j+5seXGlhvbP0q0KdGmRNs/27Zt27Zt2z9subHlxpbbP/279e/Wv9s/jr45+ubo2z8fwX0E9xHcP7DDwQ4HO9w/QcYFGRdk3D/SyEkjJ43cP2PLjS03ttw/9M3RN0ff3D+F0BVCVwjdPxbTWUxnMd0/p9WdVnda3T842OFgh4PdP8naJWuXrN0/Wt1pdafV3T/r361/t/7dP3zi8YnHJ94/DeU1lNdQ3j+e53me53nePy/qvaj3ot4/wOwBswfM3j9R70W9F/XeP+LxiccnHt8/c/TN0TdH3z8E9xHcR3DfP5X5VeZXmd8/JvyZ8GfC3z+3/t36d+vfP6QAkQJECuA/7AGzB8we4D81A9UMVDPgP34E9xHcR+A/xgUZF2Rc4D8OBzsc7HDgP1cIXSF0heA/oAl/JvyZ4D/oCqErhK7gPzAMwzAMw+A/eQ3lNZTX4D/CDgc7HOzgPwoQKUCkAOE/UhFLRSwV4T+bEm1KtCnhP+QTj088PuE/LBWxVMRS4T90FtNZTGfhP70X9V7Ue+E/BhkXZFyQ4T9OGjlp5KThP5YbW25sueE/3xx9c/TN4T8oHp94fOLhP3AfwX0E9+E/uCDjgowL4j8BIgWIFCDiP0ojJ42cNOI/kiRJkiRJ4j/aJWuXrF3iPyMnjZw0cuI/bCivobyG4j+0KdGmRJviP/wq86vMr+I/RSwVsVTE4j+OLTe23NjiP9YuWbtk7eI/HjB7wOwB4z9nMZ3FdBbjP7Ayv8r8KuM/+DPhz4Q/4z9ANQPVDFTjP4k2JdqUaOM/0jdH3xx94z8aOWnkpJHjP2I6i+kspuM/qzut7rS64z/0PM/zPM/jPzw+8fjE4+M/hD8T/kz44z/NQDUD1QzkPxZCVwhdIeQ/XkN5DeU15D+mRJsSbUrkP+9FvRf1XuQ/OEffHH1z5D+ASAEiBYjkP8hJIyeNnOQ/EUtFLBWx5D9aTGcxncXkP6JNiTYl2uQ/6k6rO63u5D8zUM1ANQPlP3xR70W9F+U/xFIRS0Us5T8MVDNQzUDlP1VVVVVVVeU/nlZ3Wt1p5T/mV5lfZX7lPy5Zu2TtkuU/d1rdaXWn5T/AW/9u/bvlPwhdIXSF0OU/UF5DeQ3l5T+ZX2V+lfnlP+Jgh4MdDuY/KmKpiKUi5j9yY8uNLTfmP7tk7ZK1S+Y/BGYPmD1g5j9MZzGdxXTmP5RoU6JNieY/3Wl1p9Wd5j8ma5esXbLmP25subHlxuY/tm3btm3b5j//bv279e/mP0hwH8F9BOc/kHFBxgUZ5z/YcmPLjS3nPyF0hdAVQuc/anWn1Z1W5z+ydsnaJWvnP/p369+tf+c/Q3kN5TWU5z+Mei/qvajnP9R7Ue9Fvec/HH1z9M3R5z9lfpX5VebnP65/t/7d+uc/9oDZA2YP6D8+gvsI7iPoP4eDHQ52OOg/0IQ/E/5M6D8YhmEYhmHoP2CHgx0Odug/qYilIpaK6D/yiccnHp/oPzqL6Syms+g/gowLMi7I6D/LjS03ttzoPxSPTzw+8eg/XJBxQcYF6T+kkZNGThrpP+2StUvWLuk/NpTXUF5D6T9+lflV5lfpP8aWG1tubOk/D5g9YPaA6T9YmV9lfpXpP6CagWoGquk/6Jujb46+6T8xncV0FtPpP3qe53me5+k/wp8Jfyb86T8KoSuErhDqP1OiTYk2Jeo/nKNvjr456j/kpJGTRk7qPyyms5jOYuo/dafVnVZ36j++qPei3ovqPwaqGahmoOo/Tqs7re606j+XrF2ydsnqP+Ctf7f+3eo/KK+hvIby6j9wsMPBDgfrP7mx5caWG+s/ArMHzB4w6z9KtCnRpkTrP5K1S9YuWes/27Zt27Zt6z8kuI/gPoLrP2y5seXGlus/tLrT6k6r6z/9u/Xv1r/rP0a9F/Ve1Os/jr45+ubo6z/Wv1v/bv3rPx/BfQT3Eew/aMKfCX8m7D+ww8EOBzvsP/jE4xOPT+w/QcYFGRdk7D+Kxycen3jsP9LISSMnjew/GsprKK+h7D9jy40tN7bsP6zMrzK/yuw/9M3RN0ff7D88z/M8z/PsP4XQFUJXCO0/ztE3R98c7T8W01lMZzHtP17Ue1HvRe0/p9WdVnda7T/w1r9b/27tPzjY4WCHg+0/gNkDZg+Y7T/J2iVrl6ztPxLcR3Afwe0/Wt1pdafV7T+i3ot6L+rtP+vfrX+3/u0/NOHPhD8T7j984vGJxyfuP8TjE49PPO4/DeU1lNdQ7j9W5leZX2XuP57neZ7nee4/5uibo2+O7j8v6r2o96LuP3jr361/t+4/wOwBswfM7j8I7iO4j+DuP1HvRb0X9e4/mvBnwp8J7z/i8YnHJx7vPyrzq8yvMu8/c/TN0TdH7z+89e/Wv1vvPwT3EdxHcO8/TPgz4c+E7z+V+VXmV5nvP976d+vfre8/JvyZ8GfC7z9u/bv179bvP7f+3fp36+8/AAAAAAAA8D8=\",\"dtype\":\"float64\",\"order\":\"little\",\"shape\":[400]},\"y\":{\"__ndarray__\":\"AAAAAAAAAAAfI77b5R+QP/4LI9ZiH6A/1hqCu8wtqD/pM6rYVh2wPwjk8XHBIrQ/8elc0OQmuD8q8XWvfym8P2EY2XEoFcA/Q0FKr4sUwj8UUWcZyRLEP4MoMEzAD8Y/6A1a+FALyD96v1zlWgXKP7sTfvO9/cs/uwbcHVr0zT8WE3V8D+nPP91aFyPf7dA/tf1saSPm0T+m+5rORN3SP+YAeKIz09M/qc9QSODH1D899+U3O7vVP8A+aP40rdY/crNzP76d1z+vSgm2x4zYP8EHhzVCetk/u5Weqh5m2j+/RUocTlDbPyNiwKzBONw/9sZkmmof3T+psLhAOgTeP3+xSBki594/xL6YvBPI3z9mI4dxgFPgP2kg7LLtweA//ZPqn0ov4T9/aTtHkJvhP+wMUsm3BuI/ox3MWLpw4j/1+d86kdniP3Qcysc1QeM/I0Q5a6Gn4z+aYLmkzQzkP147HQi0cOQ/qdfmPU7T5D8agq4DljTlP66JiCyFlOU/kJtpoRXz5T9pu4lhQVDmP+bRxYICrOY/Nsv/MVMG5z94P32zLV/nPw6fRGOMtuc/49x4tWkM6D/ikLM2wGDoP9+MXYyKs+g/St4FdcME6T8yN7fIZVTpPya5S3lsouk/pBy/ktLu6T/aL387kznqP6ynurSpguo/5D2uWhHK6j/DF/CkxQ/rPxVwuSbCU+s/KoAujwKW6z8Po6SpgtbrP7Gu5l0+Few/cX53sDFS7D8Rq9LCWI3sP8Rrq9Ovxuw/bpwpPzP+7D845SR/3zPtP7H/XSuxZ+0/2BW2+aSZ7T+NN2S+t8ntPxHjKGzm9+0/LJ1/FC4k7j8Gls7ni07uP3lXlDX9du4/IHqTbH+d7j9CXvwaEMLuP/jllO6s5O4/GC7etFMF7z9pQzhbAiTvP+rRA++2QO8/CMzBnW9b7z+4BzG1KnTvP6PPaaPmiu8/jWb39qGf7z91e+9eW7LvP9WMB6sRw+8/vDmoy8PR7z96f/7RcN7vP83iCvAX6e8/moOueLjx7z9RGrbfUfjvP2De4rnj/O8/CVXxvG3/7z9FCJ6/7//vP1glqLlp/u8/7wLSw9v67z/Gjt8XRvXvP+GikhCp7e8/qkKlKQXk7z84wMH/WtjvP07KeFCryu8/u2M1+va67z/CxC78PqnvP5InWHaEle8/vIBOqch/7z/dJET2DGjvP79c6t5STu8/aelYBZwy7z+YefMr6hTvP2USTTU/9e4/42wJJJ3T7j+ZSrwaBrDuP//Cxlt8iu4/I4wySQJj7j/YQItkmjnuP8+mtU5HDu4/UvfExwvh7T8rLc6u6rHtP7hauQHngO0/CQsR3QNO7T8esc97RBntP44oKzes4uw/10pehj6q7D/RnHD+/m/sP+QX/FHxM+w/phLxUBn26z+sTFjoerbrP44hEyIades/JOeZJPsx6z8je7gyIu3qP1sESauTpuo/EOzsCFRe6j/gE8ThZxTqP+RMIufTyOk/ohRD5Zx76T/anPvCxyzpPw4ja4FZ3Og/sJypO1eK6D9fvXQmxjboP0Rc24+r4ec/ED7n3gyL5z/sSEWT7zLnPx4o7ERZ2eY/92TBo09+5j+++jx32CHmP5hrC575w+U/TFyuDblk5T8JvRvSHATlPx+FWw0rouQ/NAgk9+k+5D8K63TcX9rjP2W+MB+TdOM/cEa1NYoN4z9adXKqS6XiP8sfgBveO+I/0nEyOkjR4T9NK63KkGXhP5uqdaO++OA/m8wDrdiK4D/mqFHh5RvgP0Zj1JbaV98/8HvtDex13j+rDJd/DpLdPzo05mJQrNw/w6txTcDE2z+vw2TybNvaP5N/kCFl8Nk/yuB6xrcD2T+nbmzncxXYP+oLfKSoJdc/kCiZNmU01j8gYJTuuEHVP96TJjSzTdQ/fpD2hGNY0z+7T51z2WHSP2HlqKYkatE/KCie11Rx0D87TPKj8+7OP57qWORG+cw/ruE+ScMByz+V5FbJiAjJPzcxMHe3Dcc/ylgtf28RxT8VZXkl0RPDP6F6+8P8FME/WTSSkCUqvj+xRC5jZyi6P7NkUx0AJbY/4SCG9zAgsj/s0kaCdjSsPwlrdrjAJqQ/0dOn5YYvmD98uoidBiCAP+64iJ0GIIC/i9Kn5YYvmL+lana4wCakv8jSRoJ2NKy/ryCG9zAgsr9hZFMdACW2v39ELmNnKLq/RzSSkCUqvr+JevvD/BTBv+5keSXRE8O/wlgtf28Rxb8wMTB3tw3Hv33kVsmICMm/luE+ScMBy7+X6ljkRvnMvzNM8qPz7s6/HSie11Rx0L9V5aimJGrRv7BPnXPZYdK/epD2hGNY07/TkyY0s03UvxVglO64QdW/hSiZNmU01r/mC3ykqCXXv5xubOdzFdi/uOB6xrcD2b+If5AhZfDZv6vDZPJs29q/v6txTcDE278oNOZiUKzcv6AMl38Okt2/7XvtDex13r9CY9SW2lffv92oUeHlG+C/lcwDrdiK4L+aqnWjvvjgv0grrcqQZeG/yXEyOkjR4b/GH4Ab3jviv1h1cqpLpeK/a0a1NYoN479dvjAfk3TjvwXrdNxf2uO/Mwgk9+k+5L8ahVsNK6LkvwS9G9IcBOW/S1yuDblk5b+Wawue+cPlv7r6PHfYIea/82TBo09+5r8cKOxEWdnmv+pIRZPvMue/DD7n3gyL579AXNuPq+Hnv1u9dCbGNui/rpypO1eK6L8MI2uBWdzov9Oc+8LHLOm/nhRD5Zx76b/gTCLn08jpv9wTxOFnFOq/CuzsCFRe6r9ZBEmrk6bqvyJ7uDIi7eq/I+eZJPsx67+LIRMiGnXrv6hMWOh6tuu/pxLxUBn267/hF/xR8TPsv8yccP7+b+y/1kpehj6q7L+OKCs3rOLsvx2xz3tEGe2/BwsR3QNO7b+2WrkB54Dtvystzq7qse2/T/fExwvh7b/MprVORw7uv9ZAi2SaOe6/IowySQJj7r/9wsZbfIruv5hKvBoGsO6/4WwJJJ3T7r9lEk01P/Xuv5d58yvqFO+/Z+lYBZwy77+/XOreUk7vv9skRPYMaO+/uoBOqch/77+QJ1h2hJXvv8HELvw+qe+/u2M1+va6779OynhQq8rvvzbAwf9a2O+/qkKlKQXk77/hopIQqe3vv8WO3xdG9e+/7wLSw9v6779YJai5af7vv0UInr/v/++/CVXxvG3/779g3uK54/zvv1Eatt9R+O+/mYOueLjx77/O4grwF+nvv3t//tFw3u+/vDmoy8PR77/WjAerEcPvv3Z7715bsu+/jmb39qGf77+kz2mj5orvv7gHMbUqdO+/CszBnW9b77/t0QPvtkDvv2pDOFsCJO+/GS7etFMF77/55ZTurOTuv0Ne/BoQwu6/InqTbH+d7r95V5Q1/XbuvwaWzueLTu6/MJ1/FC4k7r8S4yhs5vftv5A3ZL63ye2/2hW2+aSZ7b+0/10rsWftvzzlJH/fM+2/b5wpPzP+7L/Fa6vTr8bsvxer0sJYjey/c353sDFS7L+wruZdPhXsvxSjpKmC1uu/L4AujwKW678XcLkmwlPrv8QX8KTFD+u/5j2uWhHK6r+vp7q0qYLqv94vfzuTOeq/pBy/ktLu6b8suUt5bKLpvzk3t8hlVOm/TN4FdcME6b/ijF2MirPov+aQszbAYOi/59x4tWkM6L8Un0RjjLbnv3o/fbMtX+e/PMv/MVMG57/t0cWCAqzmv2u7iWFBUOa/kptpoRXz5b+yiYgshZTlvx+CrgOWNOW/r9fmPU7T5L9fOx0ItHDkv5xguaTNDOS/K0Q5a6Gn4793HMrHNUHjv/r53zqR2eK/qR3MWLpw4r/zDFLJtwbiv4dpO0eQm+G//5Pqn0ov4b9rIOyy7cHgv3Ejh3GAU+C/zL6YvBPI3797sUgZIufev7awuEA6BN6/Bcdkmmof3b8lYsCswTjcv8VFShxOUNu/w5Weqh5m2r/LB4c1QnrZv7pKCbbHjNi/cLNzP76d17/QPmj+NK3Wv0/35Tc7u9W/rs9QSODH1L/tAHiiM9PTv7D7ms5E3dK/wf1saSPm0b/rWhcj3+3QvxYTdXwP6c+/wAbcHVr0zb/lE37zvf3Lv4i/XOVaBcq/+w1a+FALyL+bKDBMwA/GvzBRZxnJEsS/Y0FKr4sUwr9mGNlxKBXAvz7xda9/Kby/Tepc0OQmuL8t5PFxwSK0vxc0qthWHbC/QxuCu8wtqL99DCPWYh+gv0IkvtvlH5C/B1wUMyamsbw=\",\"dtype\":\"float64\",\"order\":\"little\",\"shape\":[400]}},\"selected\":{\"id\":\"1035\"},\"selection_policy\":{\"id\":\"1036\"}},\"id\":\"1021\",\"type\":\"ColumnDataSource\"},{\"attributes\":{\"args\":{\"freq_slider\":{\"id\":\"1002\"},\"source\":{\"id\":\"1021\"}},\"code\":\"\\nlet f = freq_slider.value;\\nlet x = source.data['x'];\\nlet y = source.data['y'];\\n\\nfor (let i = 0; i < x.length; i++) {\\n y[i] = Math.sin(2 * Math.PI * f * x[i]);\\n}\\n\\nsource.change.emit();\\n\"},\"id\":\"1027\",\"type\":\"CustomJS\"},{\"attributes\":{\"formatter\":{\"id\":\"1033\"},\"ticker\":{\"id\":\"1013\"}},\"id\":\"1012\",\"type\":\"LinearAxis\"},{\"attributes\":{\"text\":\"\"},\"id\":\"1030\",\"type\":\"Title\"},{\"attributes\":{},\"id\":\"1017\",\"type\":\"BasicTicker\"},{\"attributes\":{},\"id\":\"1035\",\"type\":\"Selection\"},{\"attributes\":{\"axis\":{\"id\":\"1016\"},\"dimension\":1,\"ticker\":null},\"id\":\"1019\",\"type\":\"Grid\"}],\"root_ids\":[\"1028\"]},\"title\":\"Bokeh Application\",\"version\":\"2.1.0\"}};\n", " var render_items = [{\"docid\":\"81e4edf3-40b3-4a1c-b122-f5b9dae34fca\",\"root_ids\":[\"1028\"],\"roots\":{\"1028\":\"cc354a21-4ec8-4939-9b29-c5c4eebd010b\"}}];\n", " root.Bokeh.embed.embed_items_notebook(docs_json, render_items);\n", "\n", " }\n", " if (root.Bokeh !== undefined) {\n", " embed_document(root);\n", " } else {\n", " var attempts = 0;\n", " var timer = setInterval(function(root) {\n", " if (root.Bokeh !== undefined) {\n", " clearInterval(timer);\n", " embed_document(root);\n", " } else {\n", " attempts++;\n", " if (attempts > 100) {\n", " clearInterval(timer);\n", " console.log(\"Bokeh: ERROR: Unable to run BokehJS code because BokehJS library is missing\");\n", " }\n", " }\n", " }, 10, root)\n", " }\n", "})(window);" ], "application/vnd.bokehjs_exec.v0+json": "" }, "metadata": { "application/vnd.bokehjs_exec.v0+json": { "id": "1028" } }, "output_type": "display_data" } ], "source": [ "# Make a slider for frequency\n", "freq_slider = bokeh.models.Slider(\n", " start=1,\n", " end=10,\n", " step=0.1,\n", " value=1,\n", " title='frequency'\n", ")\n", "\n", "# Plot of sine wave\n", "p = bokeh.plotting.figure(\n", " frame_height=100,\n", " frame_width=300,\n", " tools='',\n", " x_range=[0, 1]\n", ")\n", "x = np.linspace(0, 1, 400)\n", "source = bokeh.models.ColumnDataSource(dict(x=x, y=np.sin(2 * np.pi * x)))\n", "\n", "p.line(source=source, x='x', y='y', line_width=2)\n", "\n", "# JavaScript code for callback\n", "js_code = \"\"\"\n", "let f = freq_slider.value;\n", "let x = source.data['x'];\n", "let y = source.data['y'];\n", "\n", "for (let i = 0; i < x.length; i++) {\n", " y[i] = Math.sin(2 * Math.PI * f * x[i]);\n", "}\n", "\n", "source.change.emit();\n", "\"\"\"\n", "\n", "# Make the callback\n", "callback = bokeh.models.CustomJS(args=dict(source=source), code=js_code)\n", "\n", "# We use the `js_on_change()` method to call the custom JavaScript code.\n", "callback.args['freq_slider'] = freq_slider\n", "freq_slider.js_on_change(\"value\", callback)\n", "\n", "bokeh.io.show(bokeh.layouts.column(freq_slider, p))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Domain-specific languages\n", "\n", "Python is a general purpose language. It is used for all sorts of applications, in and outside of science, math, and engineering. There are several languages that are specifically designed for applications in science.\n", "\n", "#### Matlab\n", "\n", "[Matlab](https://www.mathworks.com/products/matlab.html) was originally developed in the late 70's by Caltech alumnus [Cleve Moler](https://en.wikipedia.org/wiki/Cleve_Moler) as a way for his students to explore numerical linear algebra. It began as a convenient wrapper around Fortran routines in [LINPACK](https://en.wikipedia.org/wiki/LINPACK).\n", "\n", "As it did at its inception, Matlab excels in linear algebra applications. It has since expanded to include many other applications. It has widespread use in the biological sciences in image processing, and is also used to control instrumentation.\n", "\n", "Matlab is proprietary and expensive (well, everything is expensive compared to the free software we've been using). This is a problem for research applications, because it sacrifices both access to the underlying algorithms and prices other researchers out of using it.\n", "\n", "#### Mathematica\n", "\n", "[Mathematica](https://www.wolfram.com/mathematica/) is another proprietary scientific and numerical software originally written by a Caltech alumnus, this time [Steven Wolfram](https://www.stephenwolfram.com). Technically, Mathematica is not a language; the language is called Wolfram. Its use is less widespread in biology, but it is widely used across the sciences. It is also not open source and is expensive.\n", "\n", "#### R\n", "\n", "[R](https://www.r-project.org/about.html) is a language designed for statistics, data science, and statistical graphics. It is highly extensible, and thousands of packages are available. Prominent among these are the packages in the [tidyverse](https://www.tidyverse.org) which allow for efficient and elegant manipulation of data frames and high level plotting via the excellent [ggplot2 package](https://ggplot2.tidyverse.org). R has widespread use in bioinformatics and is a very effective language in these contexts.\n", "\n", "#### Julia\n", "\n", "[Julia](https://julialang.org) is a newer language specifically built for scientific computing. The developers of Julia put together a wish list for what they would want in a scientific programming language, and then build their language accordingly. Some of its features that I think are very valuable are:\n", "\n", "- It is free and open source\n", "- It has a built-in package manager\n", "- It has a large and rapidly growing set of well-developed packages; it is easily extendable.\n", "- You can call Python functions from Julia and vice-versa\n", "- Its language is intuitive, quite similar to Python.\n", "- *Everything* is just-in-time compiled. It is therefore blazingly fast.\n", "\n", "In terms of performance, Julia is *really* fast, which is a big bonus. In contrast to R, which is really focused on statistics, Julia is a more general language for scientific computing (though it is not designed for applications outside of science and numerics, like Python is). It is strong in statistics and visualization (it has data frames, random number generation, and all those goodies available in packages) If there is another language besides Python that I could see offering the bootcamp in, it would be Julia." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Language wars are counterproductive\n", "\n", "I have chosen to offer this bootcamp in Python for several reasons.\n", "\n", "1. Python has a shallow learning curve; good for beginners.\n", "2. Despite the shallow learning curve, Python and the available packages are extremely powerful and widely used.\n", "3. Python-based tools are often very good.\n", "\n", "In considering point 3, it is important to note that the Python-based tools are seldom *the* best for a particular task. If you are solving differential equations, Julia probably has a better tool. For many statistical analyses, R probably offers a better tool. But the Python-based tool for any of these applications both exists and is quite good. So, my hope is that the bootcamp has given you a Swiss Army knife in Python and its ecosystem. You have tools available to tackle most computational scientific problems you will encounter effectively.\n", "\n", "If you choose to explore other languages or packages, it is important to choose the package the is right for you and your application. As you bounce around the internet, especially on social media, you will hear a lot of noise about people saying their language is the best for and some other language \"sucks.\" I find these arguments counterproductive not even worth reading. Rather, search for principled discussion on various tools. Inform yourself with the most informative voices, not the loudest ones." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Computing environment" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPython 3.7.7\n", "IPython 7.15.0\n", "\n", "numpy 1.18.1\n", "numba 0.49.1\n", "bokeh 2.1.0\n", "jupyterlab 2.1.4\n" ] } ], "source": [ "%load_ext watermark\n", "%watermark -v -p numpy,numba,bokeh,jupyterlab" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.7" } }, "nbformat": 4, "nbformat_minor": 4 }