{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Lesson 37: Algorithmic complexity\n", "\n", "
" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", " \n", " Loading BokehJS ...\n", "
" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/javascript": [ "\n", "(function(root) {\n", " function now() {\n", " return new Date();\n", " }\n", "\n", " var force = true;\n", "\n", " if (typeof root._bokeh_onload_callbacks === \"undefined\" || force === true) {\n", " root._bokeh_onload_callbacks = [];\n", " root._bokeh_is_loading = undefined;\n", " }\n", "\n", " var JS_MIME_TYPE = 'application/javascript';\n", " var HTML_MIME_TYPE = 'text/html';\n", " var EXEC_MIME_TYPE = 'application/vnd.bokehjs_exec.v0+json';\n", " var CLASS_NAME = 'output_bokeh rendered_html';\n", "\n", " /**\n", " * Render data to the DOM node\n", " */\n", " function render(props, node) {\n", " var script = document.createElement(\"script\");\n", " node.appendChild(script);\n", " }\n", "\n", " /**\n", " * Handle when an output is cleared or removed\n", " */\n", " function handleClearOutput(event, handle) {\n", " var cell = handle.cell;\n", "\n", " var id = cell.output_area._bokeh_element_id;\n", " var server_id = cell.output_area._bokeh_server_id;\n", " // Clean up Bokeh references\n", " if (id != null && id in Bokeh.index) {\n", " Bokeh.index[id].model.document.clear();\n", " delete Bokeh.index[id];\n", " }\n", "\n", " if (server_id !== undefined) {\n", " // Clean up Bokeh references\n", " var cmd = \"from bokeh.io.state import curstate; print(curstate().uuid_to_server['\" + server_id + \"'].get_sessions()[0].document.roots[0]._id)\";\n", " cell.notebook.kernel.execute(cmd, {\n", " iopub: {\n", " output: function(msg) {\n", " var id = msg.content.text.trim();\n", " if (id in Bokeh.index) {\n", " Bokeh.index[id].model.document.clear();\n", " delete Bokeh.index[id];\n", " }\n", " }\n", " }\n", " });\n", " // Destroy server and session\n", " var cmd = \"import bokeh.io.notebook as ion; ion.destroy_server('\" + server_id + \"')\";\n", " cell.notebook.kernel.execute(cmd);\n", " }\n", " }\n", "\n", " /**\n", " * Handle when a new output is added\n", " */\n", " function handleAddOutput(event, handle) {\n", " var output_area = handle.output_area;\n", " var output = handle.output;\n", "\n", " // limit handleAddOutput to display_data with EXEC_MIME_TYPE content only\n", " if ((output.output_type != \"display_data\") || (!Object.prototype.hasOwnProperty.call(output.data, EXEC_MIME_TYPE))) {\n", " return\n", " }\n", "\n", " var toinsert = output_area.element.find(\".\" + CLASS_NAME.split(' ')[0]);\n", "\n", " if (output.metadata[EXEC_MIME_TYPE][\"id\"] !== undefined) {\n", " toinsert[toinsert.length - 1].firstChild.textContent = output.data[JS_MIME_TYPE];\n", " // store reference to embed id on output_area\n", " output_area._bokeh_element_id = output.metadata[EXEC_MIME_TYPE][\"id\"];\n", " }\n", " if (output.metadata[EXEC_MIME_TYPE][\"server_id\"] !== undefined) {\n", " var bk_div = document.createElement(\"div\");\n", " bk_div.innerHTML = output.data[HTML_MIME_TYPE];\n", " var script_attrs = bk_div.children[0].attributes;\n", " for (var i = 0; i < script_attrs.length; i++) {\n", " toinsert[toinsert.length - 1].firstChild.setAttribute(script_attrs[i].name, script_attrs[i].value);\n", " toinsert[toinsert.length - 1].firstChild.textContent = bk_div.children[0].textContent\n", " }\n", " // store reference to server id on output_area\n", " output_area._bokeh_server_id = output.metadata[EXEC_MIME_TYPE][\"server_id\"];\n", " }\n", " }\n", "\n", " function register_renderer(events, OutputArea) {\n", "\n", " function append_mime(data, metadata, element) {\n", " // create a DOM node to render to\n", " var toinsert = this.create_output_subarea(\n", " metadata,\n", " CLASS_NAME,\n", " EXEC_MIME_TYPE\n", " );\n", " this.keyboard_manager.register_events(toinsert);\n", " // Render to node\n", " var props = {data: data, metadata: metadata[EXEC_MIME_TYPE]};\n", " render(props, toinsert[toinsert.length - 1]);\n", " element.append(toinsert);\n", " return toinsert\n", " }\n", "\n", " /* Handle when an output is cleared or removed */\n", " events.on('clear_output.CodeCell', handleClearOutput);\n", " events.on('delete.Cell', handleClearOutput);\n", "\n", " /* Handle when a new output is added */\n", " events.on('output_added.OutputArea', handleAddOutput);\n", "\n", " /**\n", " * Register the mime type and append_mime function with output_area\n", " */\n", " OutputArea.prototype.register_mime_type(EXEC_MIME_TYPE, append_mime, {\n", " /* Is output safe? */\n", " safe: true,\n", " /* Index of renderer in `output_area.display_order` */\n", " index: 0\n", " });\n", " }\n", "\n", " // register the mime type if in Jupyter Notebook environment and previously unregistered\n", " if (root.Jupyter !== undefined) {\n", " var events = require('base/js/events');\n", " var OutputArea = require('notebook/js/outputarea').OutputArea;\n", "\n", " if (OutputArea.prototype.mime_types().indexOf(EXEC_MIME_TYPE) == -1) {\n", " register_renderer(events, OutputArea);\n", " }\n", " }\n", "\n", " \n", " if (typeof (root._bokeh_timeout) === \"undefined\" || force === true) {\n", " root._bokeh_timeout = Date.now() + 5000;\n", " root._bokeh_failed_load = false;\n", " }\n", "\n", " var NB_LOAD_WARNING = {'data': {'text/html':\n", " \"
\\n\"+\n", " \"

\\n\"+\n", " \"BokehJS does not appear to have successfully loaded. If loading BokehJS from CDN, this \\n\"+\n", " \"may be due to a slow or bad network connection. Possible fixes:\\n\"+\n", " \"

\\n\"+\n", " \"\\n\"+\n", " \"\\n\"+\n", " \"from bokeh.resources import INLINE\\n\"+\n", " \"output_notebook(resources=INLINE)\\n\"+\n", " \"\\n\"+\n", " \"
\"}};\n", "\n", " function display_loaded() {\n", " var el = document.getElementById(\"1002\");\n", " if (el != null) {\n", " el.textContent = \"BokehJS is loading...\";\n", " }\n", " if (root.Bokeh !== undefined) {\n", " if (el != null) {\n", " el.textContent = \"BokehJS \" + root.Bokeh.version + \" successfully loaded.\";\n", " }\n", " } else if (Date.now() < root._bokeh_timeout) {\n", " setTimeout(display_loaded, 100)\n", " }\n", " }\n", "\n", "\n", " function run_callbacks() {\n", " try {\n", " root._bokeh_onload_callbacks.forEach(function(callback) {\n", " if (callback != null)\n", " callback();\n", " });\n", " } finally {\n", " delete root._bokeh_onload_callbacks\n", " }\n", " console.debug(\"Bokeh: all callbacks have finished\");\n", " }\n", "\n", " function load_libs(css_urls, js_urls, callback) {\n", " if (css_urls == null) css_urls = [];\n", " if (js_urls == null) js_urls = [];\n", "\n", " root._bokeh_onload_callbacks.push(callback);\n", " if (root._bokeh_is_loading > 0) {\n", " console.debug(\"Bokeh: BokehJS is being loaded, scheduling callback at\", now());\n", " return null;\n", " }\n", " if (js_urls == null || js_urls.length === 0) {\n", " run_callbacks();\n", " return null;\n", " }\n", " console.debug(\"Bokeh: BokehJS not loaded, scheduling load and callback at\", now());\n", " root._bokeh_is_loading = css_urls.length + js_urls.length;\n", "\n", " function on_load() {\n", " root._bokeh_is_loading--;\n", " if (root._bokeh_is_loading === 0) {\n", " console.debug(\"Bokeh: all BokehJS libraries/stylesheets loaded\");\n", " run_callbacks()\n", " }\n", " }\n", "\n", " function on_error(url) {\n", " console.error(\"failed to load \" + url);\n", " }\n", "\n", " for (let i = 0; i < css_urls.length; i++) {\n", " const url = css_urls[i];\n", " const element = document.createElement(\"link\");\n", " element.onload = on_load;\n", " element.onerror = on_error.bind(null, url);\n", " element.rel = \"stylesheet\";\n", " element.type = \"text/css\";\n", " element.href = url;\n", " console.debug(\"Bokeh: injecting link tag for BokehJS stylesheet: \", url);\n", " document.body.appendChild(element);\n", " }\n", "\n", " const hashes = {\"https://cdn.bokeh.org/bokeh/release/bokeh-2.3.2.min.js\": \"XypntL49z55iwGVUW4qsEu83zKL3XEcz0MjuGOQ9SlaaQ68X/g+k1FcioZi7oQAc\", \"https://cdn.bokeh.org/bokeh/release/bokeh-tables-2.3.2.min.js\": \"bEsM86IHGDTLCS0Zod8a8WM6Y4+lafAL/eSiyQcuPzinmWNgNO2/olUF0Z2Dkn5i\", \"https://cdn.bokeh.org/bokeh/release/bokeh-widgets-2.3.2.min.js\": \"TX0gSQTdXTTeScqxj6PVQxTiRW8DOoGVwinyi1D3kxv7wuxQ02XkOxv0xwiypcAH\"};\n", "\n", " for (let i = 0; i < js_urls.length; i++) {\n", " const url = js_urls[i];\n", " const element = document.createElement('script');\n", " element.onload = on_load;\n", " element.onerror = on_error.bind(null, url);\n", " element.async = false;\n", " element.src = url;\n", " if (url in hashes) {\n", " element.crossOrigin = \"anonymous\";\n", " element.integrity = \"sha384-\" + hashes[url];\n", " }\n", " console.debug(\"Bokeh: injecting script tag for BokehJS library: \", url);\n", " document.head.appendChild(element);\n", " }\n", " };\n", "\n", " function inject_raw_css(css) {\n", " const element = document.createElement(\"style\");\n", " element.appendChild(document.createTextNode(css));\n", " document.body.appendChild(element);\n", " }\n", "\n", " \n", " var js_urls = [\"https://cdn.bokeh.org/bokeh/release/bokeh-2.3.2.min.js\", \"https://cdn.bokeh.org/bokeh/release/bokeh-widgets-2.3.2.min.js\", \"https://cdn.bokeh.org/bokeh/release/bokeh-tables-2.3.2.min.js\"];\n", " var css_urls = [];\n", " \n", "\n", " var inline_js = [\n", " function(Bokeh) {\n", " Bokeh.set_log_level(\"info\");\n", " },\n", " function(Bokeh) {\n", " \n", " \n", " }\n", " ];\n", "\n", " function run_inline_js() {\n", " \n", " if (root.Bokeh !== undefined || force === true) {\n", " \n", " for (var i = 0; i < inline_js.length; i++) {\n", " inline_js[i].call(root, root.Bokeh);\n", " }\n", " if (force === true) {\n", " display_loaded();\n", " }} else if (Date.now() < root._bokeh_timeout) {\n", " setTimeout(run_inline_js, 100);\n", " } else if (!root._bokeh_failed_load) {\n", " console.log(\"Bokeh: BokehJS failed to load within specified timeout.\");\n", " root._bokeh_failed_load = true;\n", " } else if (force !== true) {\n", " var cell = $(document.getElementById(\"1002\")).parents('.cell').data().cell;\n", " cell.output_area.append_execute_result(NB_LOAD_WARNING)\n", " }\n", "\n", " }\n", "\n", " if (root._bokeh_is_loading === 0) {\n", " console.debug(\"Bokeh: BokehJS loaded, going straight to plotting\");\n", " run_inline_js();\n", " } else {\n", " load_libs(css_urls, js_urls, function() {\n", " console.debug(\"Bokeh: BokehJS plotting callback run at\", now());\n", " run_inline_js();\n", " });\n", " }\n", "}(window));" ], "application/vnd.bokehjs_load.v0+json": "\n(function(root) {\n function now() {\n return new Date();\n }\n\n var force = true;\n\n if (typeof root._bokeh_onload_callbacks === \"undefined\" || force === true) {\n root._bokeh_onload_callbacks = [];\n root._bokeh_is_loading = undefined;\n }\n\n \n\n \n if (typeof (root._bokeh_timeout) === \"undefined\" || force === true) {\n root._bokeh_timeout = Date.now() + 5000;\n root._bokeh_failed_load = false;\n }\n\n var NB_LOAD_WARNING = {'data': {'text/html':\n \"
\\n\"+\n \"

\\n\"+\n \"BokehJS does not appear to have successfully loaded. If loading BokehJS from CDN, this \\n\"+\n \"may be due to a slow or bad network connection. Possible fixes:\\n\"+\n \"

\\n\"+\n \"\\n\"+\n \"\\n\"+\n \"from bokeh.resources import INLINE\\n\"+\n \"output_notebook(resources=INLINE)\\n\"+\n \"\\n\"+\n \"
\"}};\n\n function display_loaded() {\n var el = document.getElementById(\"1002\");\n if (el != null) {\n el.textContent = \"BokehJS is loading...\";\n }\n if (root.Bokeh !== undefined) {\n if (el != null) {\n el.textContent = \"BokehJS \" + root.Bokeh.version + \" successfully loaded.\";\n }\n } else if (Date.now() < root._bokeh_timeout) {\n setTimeout(display_loaded, 100)\n }\n }\n\n\n function run_callbacks() {\n try {\n root._bokeh_onload_callbacks.forEach(function(callback) {\n if (callback != null)\n callback();\n });\n } finally {\n delete root._bokeh_onload_callbacks\n }\n console.debug(\"Bokeh: all callbacks have finished\");\n }\n\n function load_libs(css_urls, js_urls, callback) {\n if (css_urls == null) css_urls = [];\n if (js_urls == null) js_urls = [];\n\n root._bokeh_onload_callbacks.push(callback);\n if (root._bokeh_is_loading > 0) {\n console.debug(\"Bokeh: BokehJS is being loaded, scheduling callback at\", now());\n return null;\n }\n if (js_urls == null || js_urls.length === 0) {\n run_callbacks();\n return null;\n }\n console.debug(\"Bokeh: BokehJS not loaded, scheduling load and callback at\", now());\n root._bokeh_is_loading = css_urls.length + js_urls.length;\n\n function on_load() {\n root._bokeh_is_loading--;\n if (root._bokeh_is_loading === 0) {\n console.debug(\"Bokeh: all BokehJS libraries/stylesheets loaded\");\n run_callbacks()\n }\n }\n\n function on_error(url) {\n console.error(\"failed to load \" + url);\n }\n\n for (let i = 0; i < css_urls.length; i++) {\n const url = css_urls[i];\n const element = document.createElement(\"link\");\n element.onload = on_load;\n element.onerror = on_error.bind(null, url);\n element.rel = \"stylesheet\";\n element.type = \"text/css\";\n element.href = url;\n console.debug(\"Bokeh: injecting link tag for BokehJS stylesheet: \", url);\n document.body.appendChild(element);\n }\n\n const hashes = {\"https://cdn.bokeh.org/bokeh/release/bokeh-2.3.2.min.js\": \"XypntL49z55iwGVUW4qsEu83zKL3XEcz0MjuGOQ9SlaaQ68X/g+k1FcioZi7oQAc\", \"https://cdn.bokeh.org/bokeh/release/bokeh-tables-2.3.2.min.js\": \"bEsM86IHGDTLCS0Zod8a8WM6Y4+lafAL/eSiyQcuPzinmWNgNO2/olUF0Z2Dkn5i\", \"https://cdn.bokeh.org/bokeh/release/bokeh-widgets-2.3.2.min.js\": \"TX0gSQTdXTTeScqxj6PVQxTiRW8DOoGVwinyi1D3kxv7wuxQ02XkOxv0xwiypcAH\"};\n\n for (let i = 0; i < js_urls.length; i++) {\n const url = js_urls[i];\n const element = document.createElement('script');\n element.onload = on_load;\n element.onerror = on_error.bind(null, url);\n element.async = false;\n element.src = url;\n if (url in hashes) {\n element.crossOrigin = \"anonymous\";\n element.integrity = \"sha384-\" + hashes[url];\n }\n console.debug(\"Bokeh: injecting script tag for BokehJS library: \", url);\n document.head.appendChild(element);\n }\n };\n\n function inject_raw_css(css) {\n const element = document.createElement(\"style\");\n element.appendChild(document.createTextNode(css));\n document.body.appendChild(element);\n }\n\n \n var js_urls = [\"https://cdn.bokeh.org/bokeh/release/bokeh-2.3.2.min.js\", \"https://cdn.bokeh.org/bokeh/release/bokeh-widgets-2.3.2.min.js\", \"https://cdn.bokeh.org/bokeh/release/bokeh-tables-2.3.2.min.js\"];\n var css_urls = [];\n \n\n var inline_js = [\n function(Bokeh) {\n Bokeh.set_log_level(\"info\");\n },\n function(Bokeh) {\n \n \n }\n ];\n\n function run_inline_js() {\n \n if (root.Bokeh !== undefined || force === true) {\n \n for (var i = 0; i < inline_js.length; i++) {\n inline_js[i].call(root, root.Bokeh);\n }\n if (force === true) {\n display_loaded();\n }} else if (Date.now() < root._bokeh_timeout) {\n setTimeout(run_inline_js, 100);\n } else if (!root._bokeh_failed_load) {\n console.log(\"Bokeh: BokehJS failed to load within specified timeout.\");\n root._bokeh_failed_load = true;\n } else if (force !== true) {\n var cell = $(document.getElementById(\"1002\")).parents('.cell').data().cell;\n cell.output_area.append_execute_result(NB_LOAD_WARNING)\n }\n\n }\n\n if (root._bokeh_is_loading === 0) {\n console.debug(\"Bokeh: BokehJS loaded, going straight to plotting\");\n run_inline_js();\n } else {\n load_libs(css_urls, js_urls, function() {\n console.debug(\"Bokeh: BokehJS plotting callback run at\", now());\n run_inline_js();\n });\n }\n}(window));" }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import itertools\n", "\n", "import numpy as np\n", "\n", "import bokeh.io\n", "import bokeh.plotting\n", "\n", "bokeh.io.output_notebook()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "So far, we have written some of our own simple programs with loops where necessary. Most of the tasks we've coded up have been pretty straightforward and there is a more or less obvious way to approach things.\n", "\n", "For some problems, and many important ones, the **algorithm**, or procedure, used to do the desired computation is important. A calculation done with one algorithm may be much much faster than when it is done with another.\n", "\n", "Unless you are in the business of coming up with novel algorithms to solve your problem of interest, most of the algorithms you use in your work in biology will be those developed by someone else. Nonetheless, it is important for you to understand what goes into algorithms and that there can by **huge** differences in computational time depending on the algorithm used." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The longest common subsequence\n", "\n", "As an illustration of this idea, we will work out an important algorithm in bioinformatics. Our goal is to compute the length of the **longest common subsequence** (**LCS**) between two sequences. (This is _not_ the longest common sub*string*.) A sub*sequence* is a sequence that can be generated by deleting entries in the original sequence, but leaving the order unchanged. For example, the subsequences of length three of `ATGC` are:\n", "\n", " ATG\n", " TGC\n", " ATC\n", " ATC\n", "\n", "All subsequences contain one deletion. There are four deletions we could choose, so there are four subsequences. The LCS between two sequences is the subsequence(s) that both share that is the longest.\n", "\n", "The LCS is useful to compare two sequences. Deletions and insertions are common throughout evolution, so the LCS give a good indication of how similar two sequences are, even if these insertions or deletions are present." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Tests!\n", "\n", "As we develop an algorithm to compute the LCS between two subsequences, we should write tests to make sure our algorithm gives the right answer. Let's write some! Whatever we call our function (we'll call a generic one `lcslen`), it should be called as `lcslen(seq1, seq2)` and return an integer, the length of the longest common subsequence." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "def test_lcslen(lcslen):\n", " \"\"\"Test a given function to computing LCS\"\"\"\n", " # Define test sequences\n", " seq1 = 'abcd'\n", " seq2 = 'abcdefg'\n", " seq3 = 'adef'\n", " \n", " assert lcslen('', '') == 0\n", " assert lcslen('', seq1) == 0\n", " assert lcslen(seq1, seq1) == 4\n", " assert lcslen(seq1, seq2) == 4\n", " assert lcslen(seq1, seq3) == 2\n", " assert lcslen(seq2, seq3) == 4" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Let's fail the test\n", "\n", "Remember, under the principles of TDD, we need to fail the test first. Our first attempt will be a naive implementation, so we'll call it `naive_lcslen()`." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "def naive_lcslen(seq1, seq2):\n", " \"\"\"\n", " Return length of longest common subsequence \n", " between two subsequences.\n", " \"\"\"\n", " \n", " return 0" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And now we'll fail the test." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "ename": "AssertionError", "evalue": "", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mAssertionError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mtest_lcslen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mnaive_lcslen\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;32m\u001b[0m in \u001b[0;36mtest_lcslen\u001b[0;34m(lcslen)\u001b[0m\n\u001b[1;32m 8\u001b[0m \u001b[0;32massert\u001b[0m \u001b[0mlcslen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m''\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m''\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;36m0\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 9\u001b[0m \u001b[0;32massert\u001b[0m \u001b[0mlcslen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m''\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mseq1\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;36m0\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 10\u001b[0;31m \u001b[0;32massert\u001b[0m \u001b[0mlcslen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mseq1\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mseq1\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;36m4\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 11\u001b[0m \u001b[0;32massert\u001b[0m \u001b[0mlcslen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mseq1\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mseq2\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;36m4\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 12\u001b[0m \u001b[0;32massert\u001b[0m \u001b[0mlcslen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mseq1\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mseq3\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;36m2\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mAssertionError\u001b[0m: " ] } ], "source": [ "test_lcslen(naive_lcslen)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Naive strategy\n", "\n", "Thinking off the top of my head, I think the following seems like a reasonable algorithm. Let $m$ be the length of `seq1` and $n$ be the length of `seq2`, with $m \\ge n$.\n", "\n", "1. Generate all subsequences of `seq1` that are length $n$.\n", "2. See if any of these subsequences match `seq2`.\n", "3. Generate all subsequences of `seq1` and of `seq2` of length $q = n - 1$\n", "4. Compare these to sets of subsequences and see if any of the subsequences match. If they do, return them.\n", "5. Go to step 3, except decrement $q$ by one each cycle until we get to $q = 1$.\n", "6. If no common subsequence is found when $q=1$, return an empty string." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Finding all subsequences\n", "\n", "To implement this strategy, we need a function to generate all subsequences of a given length. Its call signature will be\n", "\n", " allsubseqs(seq, n)\n", "\n", "and it will return a `set` of all of the subsequences. A `set` is a built-in Python data type that is like a list, but only contains the unique elements. It is unordered and immutable.\n", "\n", "It turns out that we don't need to write this function! The [itertools.combinations()](https://docs.python.org/3/library/itertools.html) function in the standard library gives this functionality. We don't need to test this (since the standard library is trustworthy; carefully unit tested), so we can just use it." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "def allsubseqs(seq, n):\n", " \"\"\"Return a set containing all subsequences of length n\"\"\"\n", " return set(itertools.combinations(seq, n))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's give it a try." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "data": { "text/plain": [ "{('a', 'b', 'c'), ('a', 'b', 'd'), ('a', 'c', 'd'), ('b', 'c', 'd')}" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "allsubseqs('abcd', 3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The resulting set has tuples as elements, but that's ok, since we can just compare tuples to other tuples." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Implementation of naive strategy\n", "\n", "Now we'll code up the main LCS function according to the algorithm we laid out in words." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "def naive_lcslen(seq1, seq2, return_lcs=False):\n", " \"\"\"Naive implementation of LCS length calculator.\"\"\"\n", " # For convenience, make sure seq1 is longer than seq2\n", " if len(seq1) < len(seq2):\n", " seq1, seq2 = seq2, seq1\n", "\n", " # Initialize length of subsequence to consider\n", " q = len(seq2)\n", "\n", " # Compare subsequences for a match\n", " while q > 0:\n", " # Generate subsequences\n", " seq1_subseqs = allsubseqs(seq1, q)\n", " seq2_subseqs = allsubseqs(seq2, q)\n", "\n", " # Compute the intersections (common subsequences)\n", " lcs_set = seq1_subseqs.intersection(seq2_subseqs)\n", "\n", " # If the intersection is non-empty, we've found the LCS\n", " if len(lcs_set) > 0:\n", " break\n", "\n", " # Decrement length of subsequence\n", " q -= 1\n", "\n", " # If we want to return the LCS, compute it\n", " if return_lcs:\n", " return q, tuple([\"\".join(lcs) for lcs in lcs_set])\n", " else:\n", " return q" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice that since we end up computing the actual LCS, and not just its length, we can actually return all of the LCSs (since the LCS is not necessarily unique) we like. I therefore put in the `return_lcs` kwarg and return the LCS as well. Now, let's run our test on it to make sure it works." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "test_lcslen(naive_lcslen)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Hurray! It passed. Now, let's try it with some biological-type sequences. We typically compare protein sequences, since we don't want frame shifts. We'll first code up a random protein sequence generator." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "rg = np.random.default_rng(3252)\n", "\n", "def random_protein_sequence(n):\n", " \"\"\"Generate a random protein sequence.\"\"\"\n", " return \"\".join(rg.choice(list(\"ACDEFGHIKLMNPQRSTVWY\"), size=n))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's use that to generate some test sequences and find the LCS. Actually, we only report one of the possibly many LCS, since it is not guaranteed to be unique." ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", " seq1: SEAARAQFEKLDGKFKFPHH\n", " seq2: IEHMSYSMKHRMNWLPSKVAMQHI\n", "LCS length: 5\n", "\n", "The LCS(s) is/are: ('ERLPH', 'ERAQH', 'EKLPH', 'SRLKH', 'ERLKH', 'SKLPH', 'SRLPH', 'SKLKH', 'EKLKH', 'SRAQH')\n" ] } ], "source": [ "# Generate random sequences and compute LCS\n", "seq1 = random_protein_sequence(20)\n", "seq2 = random_protein_sequence(24)\n", "len_lcs, lcs = naive_lcslen(seq1, seq2, return_lcs=True)\n", "\n", "print(\"\"\"\n", " seq1: {seq1:s}\n", " seq2: {seq2:s}\n", "LCS length: {len_lcs:d}\n", "\"\"\".format(seq1=seq1, seq2=seq2, len_lcs=len_lcs))\n", "\n", "print(\"The LCS(s) is/are:\", lcs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The naive implementation is sloooooow\n", "Wow. That took a while, and we were only comparing 20-some bases. Why?\n", "\n", "Let's think about what the algorithm is doing, and let's think about it in the worst-case scenario in which it does not find an LCS at all, meaning it has to go all the way through the while loop. At each iteration in the `while` loop, the `allsubseqs` function computes all subsequences of a given length. As you might guess from the name of the `itertools.combinations()` function, there are\n", "\n", "\\begin{align}\n", "\\begin{pmatrix}\n", "n\\\\k\n", "\\end{pmatrix} = \\frac{n!}{(n-k)! k!}\n", "\\end{align}\n", "\n", "subsequences of length $k$ for a sequence of length $n$. You can also see this by counting how many deletions you need to make to get a subsequence. There are\n", "\n", "\\begin{align}\n", "\\begin{pmatrix}\n", "n\\\\n-k\n", "\\end{pmatrix} = \\begin{pmatrix}\n", "n\\\\k\n", "\\end{pmatrix}\n", "\\end{align}\n", "\n", "such deletions. This number of combinations can be large. For example, to find all subsequence of length 10 in a sequence of length 20, we would have to generate\n", "\n", "\\begin{align}\n", "\\begin{pmatrix}\n", "20\\\\10\n", "\\end{pmatrix} = 184,756\n", "\\end{align}\n", "\n", "subsequences (twice) and then compare them. In the worst case, the total number of subsequences you would have to generate is\n", "\n", "\\begin{align}\n", "2 \\sum_{k=0}^n \\begin{pmatrix}\n", "n\\\\k\n", "\\end{pmatrix} = 2^{n+1}.\n", "\\end{align}\n", "\n", "This can be huge! For typical protein sequences of length $n = 300$, this amounts to $10^{90}$ generated subsequences! The biggest set of these has $10^{88}$ subsequences in it. This is more than the number of atoms in the observable universe. We could never even store the subsequences!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Time complexity\n", "\n", "Because the size of the problem, $n$, appears in the exponent in the number of operations required for our naive algorithm, it is said to have **exponential time complexity**. Exponential time algorithms are sometimes necessary, but many problems can be solved much more efficiently with more clever algorithms." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Dynamic programming solution\n", "\n", "Let's take a new strategy. We'll do a divide-and-conquer strategy. We'll start small. We start be considering the tiniest substring for each sequence and compare them. Note the difference: a sub*string* does not have any deletions; a sub*sequence* does. We'll then keep comparing longer and longer substrings, computing the length of the LCS, until we get to the whole sequence of each. The hope is that we can use the information of smaller substrings as we build up to avoid re-computing things. Let's map out our algorithm.\n", "\n", "We have an $m+1 \\times n+1$ matrix called `lcs_mat`. Entry `lcs_mat[i, j]` stores the length of the LCS for substrings `seq1[0:i]` and `seq2[0:j]`. \n", "\n", "Let's say we are looking at entry $(i, j)$, as we have already computed the entries for all rows less than $i$ and all columns less than $j$. \n", "\n", "Now, if `seq1[i-1] == seq2[j-1]`, we know that \n", "\n", " lcs_mat[i,j] = lcs_mat[i-1, j-1] + 1.\n", "\n", "Why? Because we just matched another character, so we extend the subsequence length by one.\n", "\n", "But, if `seq1[i-1] != seq2[j-1]`, we cannot have a longer common subsequence than we have already detected. Either the longest common subsequence of the substrings (`seq1[0:i-1]`, `seq2[0:j]`) or of (`seq1[0:i]`, `seq2[0:j-1]`) is still the longest.\n", "\n", "The cool thing is that we can trivially compute the first row and column of `lcs_mat`. They're all zeros, since no substring can match. So, we just keep moving `i` and `j` forward, checking to see if there is a sequence match at positions `i,j` or using results we already computed.\n", "\n", "An example `lcs_mat` is shown below. We first fill out the first row and first column, and then proceed going left to right one row at a time.\n", "\n", "| | Ø | `a` | `b` | `c` | `d`|\n", "| :-----: | :-----: | :-----: | :-----: | :-----: | :-----: |\n", "| Ø | 0 | 0 | 0 | 0 | 0 |\n", "| `b` | 0 | 0 | 1 | 0 | 0 |\n", "| `b` | 0 | 0 | 1 | 1 | 1 |\n", "| `c` | 0 | 0 | 1 | 2 | 2 |\n", "\n", "This style of building up a solution to a big problem from smaller subproblems is called **dynamic programming**. The name is an [odd historical artifact](https://en.wikipedia.org/wiki/Dynamic_programming#History). It's not especially \"dynamic,\" it's just a name. Let's code this up in Python!" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "def dp_lcslen(seq1, seq2):\n", " \"\"\"Length of LCS using a dynamic program.\"\"\"\n", " # Initialize matrix of substring LCSs.\n", " lcs_mat = np.zeros((len(seq1) + 1, len(seq2) + 1), dtype=int)\n", "\n", " # Build matrix, left to right, row by row\n", " for i in range(1, len(seq1) + 1):\n", " for j in range(1, len(seq2) + 1):\n", " if seq1[i - 1] == seq2[j - 1]:\n", " lcs_mat[i, j] = lcs_mat[i - 1, j - 1] + 1\n", " else:\n", " lcs_mat[i, j] = max(lcs_mat[i - 1, j], lcs_mat[i, j - 1])\n", "\n", " return lcs_mat[-1, -1]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's run it through the unit test!" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "test_lcslen(dp_lcslen)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Excellent! It tests out! Now, let's try it out on our protein sequences." ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "data": { "text/plain": [ "5" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dp_lcslen(seq1, seq2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We got the correct answer, and it was way faster!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Time complexity of the dynamic program\n", "\n", "To get the time complexity of the dynamic program, we again think about how many operations we need to do. We build a matrix which has about $n^2$ entries (where we assume $m \\approx n$. Each entry consists of a simple comparison of two characters and then either an addition of `1` or fetching another value. So, we really only have roughly $n^2$ operations to achieve the result. So, the dynamic programming algorithm has a time complexity of $n^2$. Let's compare how long things will take with the respective algorithms as a function of the size of the problem, $n$." ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "
\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/javascript": [ "(function(root) {\n", " function embed_document(root) {\n", " \n", " var docs_json = {\"cfef87a7-e9c1-46a7-990c-1d118f1f1702\":{\"defs\":[],\"roots\":{\"references\":[{\"attributes\":{\"below\":[{\"id\":\"1012\"}],\"center\":[{\"id\":\"1015\"},{\"id\":\"1019\"},{\"id\":\"1050\"}],\"frame_height\":225,\"frame_width\":325,\"left\":[{\"id\":\"1016\"}],\"renderers\":[{\"id\":\"1037\"},{\"id\":\"1055\"}],\"title\":{\"id\":\"1040\"},\"toolbar\":{\"id\":\"1027\"},\"x_range\":{\"id\":\"1004\"},\"x_scale\":{\"id\":\"1008\"},\"y_range\":{\"id\":\"1006\"},\"y_scale\":{\"id\":\"1010\"}},\"id\":\"1003\",\"subtype\":\"Figure\",\"type\":\"Plot\"},{\"attributes\":{\"end\":100000000000.0,\"start\":0.1},\"id\":\"1006\",\"type\":\"Range1d\"},{\"attributes\":{\"label\":{\"value\":\"dynamic program\"},\"renderers\":[{\"id\":\"1055\"}]},\"id\":\"1070\",\"type\":\"LegendItem\"},{\"attributes\":{\"source\":{\"id\":\"1034\"}},\"id\":\"1038\",\"type\":\"CDSView\"},{\"attributes\":{\"line_color\":\"orange\",\"line_width\":2,\"x\":{\"field\":\"x\"},\"y\":{\"field\":\"y\"}},\"id\":\"1053\",\"type\":\"Line\"},{\"attributes\":{\"axis_label\":\"n\",\"formatter\":{\"id\":\"1043\"},\"major_label_policy\":{\"id\":\"1041\"},\"ticker\":{\"id\":\"1013\"}},\"id\":\"1012\",\"type\":\"LogAxis\"},{\"attributes\":{\"line_alpha\":0.1,\"line_color\":\"orange\",\"line_width\":2,\"x\":{\"field\":\"x\"},\"y\":{\"field\":\"y\"}},\"id\":\"1054\",\"type\":\"Line\"},{\"attributes\":{},\"id\":\"1041\",\"type\":\"AllLabels\"},{\"attributes\":{\"active_multi\":null,\"tools\":[{\"id\":\"1020\"},{\"id\":\"1021\"},{\"id\":\"1022\"},{\"id\":\"1023\"},{\"id\":\"1024\"},{\"id\":\"1025\"}]},\"id\":\"1027\",\"type\":\"Toolbar\"},{\"attributes\":{},\"id\":\"1044\",\"type\":\"AllLabels\"},{\"attributes\":{\"overlay\":{\"id\":\"1026\"}},\"id\":\"1022\",\"type\":\"BoxZoomTool\"},{\"attributes\":{\"line_alpha\":0.1,\"line_color\":\"#1f77b4\",\"line_width\":2,\"x\":{\"field\":\"x\"},\"y\":{\"field\":\"y\"}},\"id\":\"1036\",\"type\":\"Line\"},{\"attributes\":{\"label\":{\"value\":\"naive\"},\"renderers\":[{\"id\":\"1037\"}]},\"id\":\"1051\",\"type\":\"LegendItem\"},{\"attributes\":{\"data\":{\"x\":[2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207,208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223,224,225,226,227,228,229,230,231,232,233,234,235,236,237,238,239,240,241,242,243,244,245,246,247,248,249,250,251,252,253,254,255,256,257,258,259,260,261,262,263,264,265,266,267,268,269,270,271,272,273,274,275,276,277,278,279,280,281,282,283,284,285,286,287,288,289,290,291,292,293,294,295,296,297,298,299,300,301,302,303,304,305,306,307,308,309,310,311,312,313,314,315,316,317,318,319,320,321,322,323,324,325,326,327,328,329,330,331,332,333,334,335,336,337,338,339,340,341,342,343,344,345,346,347,348,349,350,351,352,353,354,355,356,357,358,359,360,361,362,363,364,365,366,367,368,369,370,371,372,373,374,375,376,377,378,379,380,381,382,383,384,385,386,387,388,389,390,391,392,393,394,395,396,397,398,399,400,401,402,403,404,405,406,407,408,409,410,411,412,413,414,415,416,417,418,419,420,421,422,423,424,425,426,427,428,429,430,431,432,433,434,435,436,437,438,439,440,441,442,443,444,445,446,447,448,449,450,451,452,453,454,455,456,457,458,459,460,461,462,463,464,465,466,467,468,469,470,471,472,473,474,475,476,477,478,479,480,481,482,483,484,485,486,487,488,489,490,491,492,493,494,495,496,497,498,499,500,501,502,503,504,505,506,507,508,509,510,511,512,513,514,515,516,517,518,519,520,521,522,523,524,525,526,527,528,529,530,531,532,533,534,535,536,537,538,539,540,541,542,543,544,545,546,547,548,549,550,551,552,553,554,555,556,557,558,559,560,561,562,563,564,565,566,567,568,569,570,571,572,573,574,575,576,577,578,579,580,581,582,583,584,585,586,587,588,589,590,591,592,593,594,595,596,597,598,599,600,601,602,603,604,605,606,607,608,609,610,611,612,613,614,615,616,617,618,619,620,621,622,623,624,625,626,627,628,629,630,631,632,633,634,635,636,637,638,639,640,641,642,643,644,645,646,647,648,649,650,651,652,653,654,655,656,657,658,659,660,661,662,663,664,665,666,667,668,669,670,671,672,673,674,675,676,677,678,679,680,681,682,683,684,685,686,687,688,689,690,691,692,693,694,695,696,697,698,699,700,701,702,703,704,705,706,707,708,709,710,711,712,713,714,715,716,717,718,719,720,721,722,723,724,725,726,727,728,729,730,731,732,733,734,735,736,737,738,739,740,741,742,743,744,745,746,747,748,749,750,751,752,753,754,755,756,757,758,759,760,761,762,763,764,765,766,767,768,769,770,771,772,773,774,775,776,777,778,779,780,781,782,783,784,785,786,787,788,789,790,791,792,793,794,795,796,797,798,799,800,801,802,803,804,805,806,807,808,809,810,811,812,813,814,815,816,817,818,819,820,821,822,823,824,825,826,827,828,829,830,831,832,833,834,835,836,837,838,839,840,841,842,843,844,845,846,847,848,849,850,851,852,853,854,855,856,857,858,859,860,861,862,863,864,865,866,867,868,869,870,871,872,873,874,875,876,877,878,879,880,881,882,883,884,885,886,887,888,889,890,891,892,893,894,895,896,897,898,899,900,901,902,903,904,905,906,907,908,909,910,911,912,913,914,915,916,917,918,919,920,921,922,923,924,925,926,927,928,929,930,931,932,933,934,935,936,937,938,939,940,941,942,943,944,945,946,947,948,949,950,951,952,953,954,955,956,957,958,959,960,961,962,963,964,965,966,967,968,969,970,971,972,973,974,975,976,977,978,979,980,981,982,983,984,985,986,987,988,989,990,991,992,993,994,995,996,997,998,999,1000],\"y\":[4,9,16,25,36,49,64,81,100,121,144,169,196,225,256,289,324,361,400,441,484,529,576,625,676,729,784,841,900,961,1024,1089,1156,1225,1296,1369,1444,1521,1600,1681,1764,1849,1936,2025,2116,2209,2304,2401,2500,2601,2704,2809,2916,3025,3136,3249,3364,3481,3600,3721,3844,3969,4096,4225,4356,4489,4624,4761,4900,5041,5184,5329,5476,5625,5776,5929,6084,6241,6400,6561,6724,6889,7056,7225,7396,7569,7744,7921,8100,8281,8464,8649,8836,9025,9216,9409,9604,9801,10000,10201,10404,10609,10816,11025,11236,11449,11664,11881,12100,12321,12544,12769,12996,13225,13456,13689,13924,14161,14400,14641,14884,15129,15376,15625,15876,16129,16384,16641,16900,17161,17424,17689,17956,18225,18496,18769,19044,19321,19600,19881,20164,20449,20736,21025,21316,21609,21904,22201,22500,22801,23104,23409,23716,24025,24336,24649,24964,25281,25600,25921,26244,26569,26896,27225,27556,27889,28224,28561,28900,29241,29584,29929,30276,30625,30976,31329,31684,32041,32400,32761,33124,33489,33856,34225,34596,34969,35344,35721,36100,36481,36864,37249,37636,38025,38416,38809,39204,39601,40000,40401,40804,41209,41616,42025,42436,42849,43264,43681,44100,44521,44944,45369,45796,46225,46656,47089,47524,47961,48400,48841,49284,49729,50176,50625,51076,51529,51984,52441,52900,53361,53824,54289,54756,55225,55696,56169,56644,57121,57600,58081,58564,59049,59536,60025,60516,61009,61504,62001,62500,63001,63504,64009,64516,65025,65536,66049,66564,67081,67600,68121,68644,69169,69696,70225,70756,71289,71824,72361,72900,73441,73984,74529,75076,75625,76176,76729,77284,77841,78400,78961,79524,80089,80656,81225,81796,82369,82944,83521,84100,84681,85264,85849,86436,87025,87616,88209,88804,89401,90000,90601,91204,91809,92416,93025,93636,94249,94864,95481,96100,96721,97344,97969,98596,99225,99856,100489,101124,101761,102400,103041,103684,104329,104976,105625,106276,106929,107584,108241,108900,109561,110224,110889,111556,112225,112896,113569,114244,114921,115600,116281,116964,117649,118336,119025,119716,120409,121104,121801,122500,123201,123904,124609,125316,126025,126736,127449,128164,128881,129600,130321,131044,131769,132496,133225,133956,134689,135424,136161,136900,137641,138384,139129,139876,140625,141376,142129,142884,143641,144400,145161,145924,146689,147456,148225,148996,149769,150544,151321,152100,152881,153664,154449,155236,156025,156816,157609,158404,159201,160000,160801,161604,162409,163216,164025,164836,165649,166464,167281,168100,168921,169744,170569,171396,172225,173056,173889,174724,175561,176400,177241,178084,178929,179776,180625,181476,182329,183184,184041,184900,185761,186624,187489,188356,189225,190096,190969,191844,192721,193600,194481,195364,196249,197136,198025,198916,199809,200704,201601,202500,203401,204304,205209,206116,207025,207936,208849,209764,210681,211600,212521,213444,214369,215296,216225,217156,218089,219024,219961,220900,221841,222784,223729,224676,225625,226576,227529,228484,229441,230400,231361,232324,233289,234256,235225,236196,237169,238144,239121,240100,241081,242064,243049,244036,245025,246016,247009,248004,249001,250000,251001,252004,253009,254016,255025,256036,257049,258064,259081,260100,261121,262144,263169,264196,265225,266256,267289,268324,269361,270400,271441,272484,273529,274576,275625,276676,277729,278784,279841,280900,281961,283024,284089,285156,286225,287296,288369,289444,290521,291600,292681,293764,294849,295936,297025,298116,299209,300304,301401,302500,303601,304704,305809,306916,308025,309136,310249,311364,312481,313600,314721,315844,316969,318096,319225,320356,321489,322624,323761,324900,326041,327184,328329,329476,330625,331776,332929,334084,335241,336400,337561,338724,339889,341056,342225,343396,344569,345744,346921,348100,349281,350464,351649,352836,354025,355216,356409,357604,358801,360000,361201,362404,363609,364816,366025,367236,368449,369664,370881,372100,373321,374544,375769,376996,378225,379456,380689,381924,383161,384400,385641,386884,388129,389376,390625,391876,393129,394384,395641,396900,398161,399424,400689,401956,403225,404496,405769,407044,408321,409600,410881,412164,413449,414736,416025,417316,418609,419904,421201,422500,423801,425104,426409,427716,429025,430336,431649,432964,434281,435600,436921,438244,439569,440896,442225,443556,444889,446224,447561,448900,450241,451584,452929,454276,455625,456976,458329,459684,461041,462400,463761,465124,466489,467856,469225,470596,471969,473344,474721,476100,477481,478864,480249,481636,483025,484416,485809,487204,488601,490000,491401,492804,494209,495616,497025,498436,499849,501264,502681,504100,505521,506944,508369,509796,511225,512656,514089,515524,516961,518400,519841,521284,522729,524176,525625,527076,528529,529984,531441,532900,534361,535824,537289,538756,540225,541696,543169,544644,546121,547600,549081,550564,552049,553536,555025,556516,558009,559504,561001,562500,564001,565504,567009,568516,570025,571536,573049,574564,576081,577600,579121,580644,582169,583696,585225,586756,588289,589824,591361,592900,594441,595984,597529,599076,600625,602176,603729,605284,606841,608400,609961,611524,613089,614656,616225,617796,619369,620944,622521,624100,625681,627264,628849,630436,632025,633616,635209,636804,638401,640000,641601,643204,644809,646416,648025,649636,651249,652864,654481,656100,657721,659344,660969,662596,664225,665856,667489,669124,670761,672400,674041,675684,677329,678976,680625,682276,683929,685584,687241,688900,690561,692224,693889,695556,697225,698896,700569,702244,703921,705600,707281,708964,710649,712336,714025,715716,717409,719104,720801,722500,724201,725904,727609,729316,731025,732736,734449,736164,737881,739600,741321,743044,744769,746496,748225,749956,751689,753424,755161,756900,758641,760384,762129,763876,765625,767376,769129,770884,772641,774400,776161,777924,779689,781456,783225,784996,786769,788544,790321,792100,793881,795664,797449,799236,801025,802816,804609,806404,808201,810000,811801,813604,815409,817216,819025,820836,822649,824464,826281,828100,829921,831744,833569,835396,837225,839056,840889,842724,844561,846400,848241,850084,851929,853776,855625,857476,859329,861184,863041,864900,866761,868624,870489,872356,874225,876096,877969,879844,881721,883600,885481,887364,889249,891136,893025,894916,896809,898704,900601,902500,904401,906304,908209,910116,912025,913936,915849,917764,919681,921600,923521,925444,927369,929296,931225,933156,935089,937024,938961,940900,942841,944784,946729,948676,950625,952576,954529,956484,958441,960400,962361,964324,966289,968256,970225,972196,974169,976144,978121,980100,982081,984064,986049,988036,990025,992016,994009,996004,998001,1000000]},\"selected\":{\"id\":\"1069\"},\"selection_policy\":{\"id\":\"1068\"}},\"id\":\"1052\",\"type\":\"ColumnDataSource\"},{\"attributes\":{},\"id\":\"1048\",\"type\":\"UnionRenderers\"},{\"attributes\":{\"data_source\":{\"id\":\"1052\"},\"glyph\":{\"id\":\"1053\"},\"hover_glyph\":null,\"muted_glyph\":null,\"nonselection_glyph\":{\"id\":\"1054\"},\"view\":{\"id\":\"1056\"}},\"id\":\"1055\",\"type\":\"GlyphRenderer\"},{\"attributes\":{},\"id\":\"1049\",\"type\":\"Selection\"},{\"attributes\":{},\"id\":\"1068\",\"type\":\"UnionRenderers\"},{\"attributes\":{},\"id\":\"1069\",\"type\":\"Selection\"},{\"attributes\":{\"line_color\":\"#1f77b4\",\"line_width\":2,\"x\":{\"field\":\"x\"},\"y\":{\"field\":\"y\"}},\"id\":\"1035\",\"type\":\"Line\"},{\"attributes\":{},\"id\":\"1010\",\"type\":\"LogScale\"},{\"attributes\":{\"num_minor_ticks\":10},\"id\":\"1017\",\"type\":\"LogTicker\"},{\"attributes\":{\"ticker\":null},\"id\":\"1043\",\"type\":\"LogTickFormatter\"},{\"attributes\":{},\"id\":\"1040\",\"type\":\"Title\"},{\"attributes\":{},\"id\":\"1021\",\"type\":\"WheelZoomTool\"},{\"attributes\":{\"data\":{\"x\":[2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207,208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223,224,225,226,227,228,229,230,231,232,233,234,235,236,237,238,239,240,241,242,243,244,245,246,247,248,249,250,251,252,253,254,255,256,257,258,259,260,261,262,263,264,265,266,267,268,269,270,271,272,273,274,275,276,277,278,279,280,281,282,283,284,285,286,287,288,289,290,291,292,293,294,295,296,297,298,299,300,301,302,303,304,305,306,307,308,309,310,311,312,313,314,315,316,317,318,319,320,321,322,323,324,325,326,327,328,329,330,331,332,333,334,335,336,337,338,339,340,341,342,343,344,345,346,347,348,349,350,351,352,353,354,355,356,357,358,359,360,361,362,363,364,365,366,367,368,369,370,371,372,373,374,375,376,377,378,379,380,381,382,383,384,385,386,387,388,389,390,391,392,393,394,395,396,397,398,399,400,401,402,403,404,405,406,407,408,409,410,411,412,413,414,415,416,417,418,419,420,421,422,423,424,425,426,427,428,429,430,431,432,433,434,435,436,437,438,439,440,441,442,443,444,445,446,447,448,449,450,451,452,453,454,455,456,457,458,459,460,461,462,463,464,465,466,467,468,469,470,471,472,473,474,475,476,477,478,479,480,481,482,483,484,485,486,487,488,489,490,491,492,493,494,495,496,497,498,499,500,501,502,503,504,505,506,507,508,509,510,511,512,513,514,515,516,517,518,519,520,521,522,523,524,525,526,527,528,529,530,531,532,533,534,535,536,537,538,539,540,541,542,543,544,545,546,547,548,549,550,551,552,553,554,555,556,557,558,559,560,561,562,563,564,565,566,567,568,569,570,571,572,573,574,575,576,577,578,579,580,581,582,583,584,585,586,587,588,589,590,591,592,593,594,595,596,597,598,599,600,601,602,603,604,605,606,607,608,609,610,611,612,613,614,615,616,617,618,619,620,621,622,623,624,625,626,627,628,629,630,631,632,633,634,635,636,637,638,639,640,641,642,643,644,645,646,647,648,649,650,651,652,653,654,655,656,657,658,659,660,661,662,663,664,665,666,667,668,669,670,671,672,673,674,675,676,677,678,679,680,681,682,683,684,685,686,687,688,689,690,691,692,693,694,695,696,697,698,699,700,701,702,703,704,705,706,707,708,709,710,711,712,713,714,715,716,717,718,719,720,721,722,723,724,725,726,727,728,729,730,731,732,733,734,735,736,737,738,739,740,741,742,743,744,745,746,747,748,749,750,751,752,753,754,755,756,757,758,759,760,761,762,763,764,765,766,767,768,769,770,771,772,773,774,775,776,777,778,779,780,781,782,783,784,785,786,787,788,789,790,791,792,793,794,795,796,797,798,799,800,801,802,803,804,805,806,807,808,809,810,811,812,813,814,815,816,817,818,819,820,821,822,823,824,825,826,827,828,829,830,831,832,833,834,835,836,837,838,839,840,841,842,843,844,845,846,847,848,849,850,851,852,853,854,855,856,857,858,859,860,861,862,863,864,865,866,867,868,869,870,871,872,873,874,875,876,877,878,879,880,881,882,883,884,885,886,887,888,889,890,891,892,893,894,895,896,897,898,899,900,901,902,903,904,905,906,907,908,909,910,911,912,913,914,915,916,917,918,919,920,921,922,923,924,925,926,927,928,929,930,931,932,933,934,935,936,937,938,939,940,941,942,943,944,945,946,947,948,949,950,951,952,953,954,955,956,957,958,959,960,961,962,963,964,965,966,967,968,969,970,971,972,973,974,975,976,977,978,979,980,981,982,983,984,985,986,987,988,989,990,991,992,993,994,995,996,997,998,999,1000],\"y\":[4,8,16,32,64,128,256,512,1024,2048,4096,8192,16384,32768,65536,131072,262144,524288,1048576,2097152,4194304,8388608,16777216,33554432,67108864,134217728,268435456,536870912,1073741824,2147483648,4294967296,8589934592,17179869184,34359738368,68719476736,137438953472,274877906944,549755813888,1099511627776,2199023255552,4398046511104,8796093022208,17592186044416,35184372088832,70368744177664,140737488355328,281474976710656,562949953421312,1125899906842624,2251799813685248,4503599627370496,9007199254740992,18014398509481984,36028797018963968,72057594037927936,144115188075855872,288230376151711744,576460752303423488,1152921504606846976,2305843009213693952,4611686018427387904,-9223372036854775808,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]},\"selected\":{\"id\":\"1049\"},\"selection_policy\":{\"id\":\"1048\"}},\"id\":\"1034\",\"type\":\"ColumnDataSource\"},{\"attributes\":{\"bottom_units\":\"screen\",\"fill_alpha\":0.5,\"fill_color\":\"lightgrey\",\"left_units\":\"screen\",\"level\":\"overlay\",\"line_alpha\":1.0,\"line_color\":\"black\",\"line_dash\":[4,4],\"line_width\":2,\"right_units\":\"screen\",\"syncable\":false,\"top_units\":\"screen\"},\"id\":\"1026\",\"type\":\"BoxAnnotation\"},{\"attributes\":{},\"id\":\"1023\",\"type\":\"SaveTool\"},{\"attributes\":{\"num_minor_ticks\":10},\"id\":\"1013\",\"type\":\"LogTicker\"},{\"attributes\":{\"axis\":{\"id\":\"1016\"},\"dimension\":1,\"ticker\":null},\"id\":\"1019\",\"type\":\"Grid\"},{\"attributes\":{\"items\":[{\"id\":\"1051\"},{\"id\":\"1070\"}],\"location\":\"bottom_right\"},\"id\":\"1050\",\"type\":\"Legend\"},{\"attributes\":{\"axis\":{\"id\":\"1012\"},\"ticker\":null},\"id\":\"1015\",\"type\":\"Grid\"},{\"attributes\":{\"axis_label\":\"time to compute (a.u.)\",\"formatter\":{\"id\":\"1046\"},\"major_label_policy\":{\"id\":\"1044\"},\"ticker\":{\"id\":\"1017\"}},\"id\":\"1016\",\"type\":\"LogAxis\"},{\"attributes\":{\"ticker\":null},\"id\":\"1046\",\"type\":\"LogTickFormatter\"},{\"attributes\":{},\"id\":\"1020\",\"type\":\"PanTool\"},{\"attributes\":{\"data_source\":{\"id\":\"1034\"},\"glyph\":{\"id\":\"1035\"},\"hover_glyph\":null,\"muted_glyph\":null,\"nonselection_glyph\":{\"id\":\"1036\"},\"view\":{\"id\":\"1038\"}},\"id\":\"1037\",\"type\":\"GlyphRenderer\"},{\"attributes\":{\"end\":1000,\"start\":2},\"id\":\"1004\",\"type\":\"Range1d\"},{\"attributes\":{},\"id\":\"1024\",\"type\":\"ResetTool\"},{\"attributes\":{},\"id\":\"1008\",\"type\":\"LogScale\"},{\"attributes\":{},\"id\":\"1025\",\"type\":\"HelpTool\"},{\"attributes\":{\"source\":{\"id\":\"1052\"}},\"id\":\"1056\",\"type\":\"CDSView\"}],\"root_ids\":[\"1003\"]},\"title\":\"Bokeh Application\",\"version\":\"2.3.2\"}};\n", " var render_items = [{\"docid\":\"cfef87a7-e9c1-46a7-990c-1d118f1f1702\",\"root_ids\":[\"1003\"],\"roots\":{\"1003\":\"157c5504-9d11-4142-88e8-65affb12a55e\"}}];\n", " root.Bokeh.embed.embed_items_notebook(docs_json, render_items);\n", "\n", " }\n", " if (root.Bokeh !== undefined) {\n", " embed_document(root);\n", " } else {\n", " var attempts = 0;\n", " var timer = setInterval(function(root) {\n", " if (root.Bokeh !== undefined) {\n", " clearInterval(timer);\n", " embed_document(root);\n", " } else {\n", " attempts++;\n", " if (attempts > 100) {\n", " clearInterval(timer);\n", " console.log(\"Bokeh: ERROR: Unable to run BokehJS code because BokehJS library is missing\");\n", " }\n", " }\n", " }, 10, root)\n", " }\n", "})(window);" ], "application/vnd.bokehjs_exec.v0+json": "" }, "metadata": { "application/vnd.bokehjs_exec.v0+json": { "id": "1003" } }, "output_type": "display_data" } ], "source": [ "n = np.arange(2, 1001)\n", "expon_complexity = 2 ** n\n", "polynomial_complexity = n ** 2\n", "\n", "p = bokeh.plotting.figure(\n", " x_axis_label=\"n\",\n", " y_axis_label=\"time to compute (a.u.)\",\n", " x_axis_type=\"log\",\n", " y_axis_type=\"log\",\n", " frame_height=225,\n", " frame_width=325,\n", " x_range=[n.min(), n.max()],\n", " y_range=[0.1, 1e11],\n", ")\n", "\n", "p.line(n, expon_complexity, line_width=2, legend_label=\"naive\")\n", "p.line(n, polynomial_complexity, line_width=2, color='orange', legend_label=\"dynamic program\")\n", "\n", "p.legend.location = 'bottom_right'\n", "\n", "bokeh.io.show(p)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We have no prayer of computing large sequences using the naive approach. The dynamic program makes the problem feasable." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Getting the LCS\n", "\n", "Our dynamic program is really fast for finding the length of the LCS, but what about the LCS itself? In many dynamic programs, we can do **backtracking**, where we go back through the algorithm and extract information. We can do the same with the LCS algorithm. We start at the bottom of the `lcs_mat`, and then work backwards, adding a character whenever it was added in the LCS length algorithm." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "def dp_lcslen(seq1, seq2, return_lcs=False):\n", " \"\"\"Length of LCS using a dynamic program.\"\"\"\n", " # Initialize matrix of substring LCSs.\n", " lcs_mat = np.zeros((len(seq1) + 1, len(seq2) + 1), dtype=int)\n", "\n", " # Build matrix, left to right, row by row\n", " for i in range(1, len(seq1) + 1):\n", " for j in range(1, len(seq2) + 1):\n", " if seq1[i - 1] == seq2[j - 1]:\n", " lcs_mat[i, j] = lcs_mat[i - 1, j - 1] + 1\n", " else:\n", " lcs_mat[i, j] = max(lcs_mat[i - 1, j], lcs_mat[i, j - 1])\n", "\n", " if return_lcs:\n", " lcs = \"\"\n", " i = len(seq1)\n", " j = len(seq2)\n", " while i != 0 and j != 0:\n", " if lcs_mat[i, j] == lcs_mat[i - 1, j]:\n", " i -= 1\n", " elif lcs_mat[i, j] == lcs_mat[i, j - 1]:\n", " j -= 1\n", " else:\n", " lcs = seq1[i - 1] + lcs\n", " i -= 1\n", " j -= 1\n", " return lcs_mat[-1, -1], lcs\n", " else:\n", " return lcs_mat[-1, -1]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's give it a whirl!" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "data": { "text/plain": [ "(5, 'SRAQH')" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dp_lcslen(seq1, seq2, True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The dynamic program only returns one of the LCSs, since we did not enumerate them all." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Examples of dynamic programs\n", "\n", "**Dynamic programming** wherein we build up a solution to a big problem from smaller subproblems is pervasive in biology. Here are some examples of problems that use dynamic programming.\n", "\n", "* Global alignment (Needleman-Wunsch)\n", "* Local alignment (Smith-Waterman)\n", "* Some multiple sequence alignment algorithms (though most use other methods)\n", "* RNA structure prediction\n", "* Structural alignment (such as SSAP)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## A word of warning about algorithmic development\n", "\n", "In LCS example, we found that practical calculations the LCS length were impossible with our first algorithm because of time and storage limitations. However, the algorithm *works*. In general, when you are writing code, first get something that works. That is, something that passes your unit tests and does what you want it to do. *Then* decide if it needs optimization. It is quite possible that a \"naive\" algorithm might be just find for your purposes. If this is the case, don't optimize. I think it's important to remember what Donald Knuth, one of the great computer scientists of all time, said.\n", "\n", "
\n", "\n", "Premature optimization is the root of all evil.\n", " \n", "
\n", "\n", "This goes also for what language you write your code in. Coding something up in Python or some other high level interpreted language is typically much easier than in a compiled language like C or Fortran. Your code is also more portable. Only when you really need the speed should you go for one of those languages." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Computing environment" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "tags": [ "hide-input" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Python implementation: CPython\n", "Python version : 3.8.10\n", "IPython version : 7.22.0\n", "\n", "numpy : 1.20.2\n", "bokeh : 2.3.2\n", "jupyterlab: 3.0.14\n", "\n" ] } ], "source": [ "%load_ext watermark\n", "%watermark -v -p numpy,bokeh,jupyterlab" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" } }, "nbformat": 4, "nbformat_minor": 4 }