"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Problem size\n",
"n = np.arange(2, 1001)\n",
"\n",
"# Plot the complexity\n",
"plt.loglog(n, 2**n)\n",
"plt.loglog(n, n**2)\n",
"\n",
"# Label plot\n",
"plt.xlabel('n')\n",
"plt.ylabel('time (a.u.)')\n",
"plt.legend(('naive', 'dynamic program'), loc='upper right')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We have no prayer of computing large sequences using the naive approach. The dynamic program make the problem feasable."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Getting the LCS\n",
"Our dynamic program is really fast for finding the length of the LCS, but what about the LCS itself? In many dynamicm programs, we can do **backtracking**, where we go back through the algorithm and extract information. We can do the same with the LCS algorithm. We start at the bottom of the `lcs_mat`, and then work backwards, adding a character whenever it was added in the LCS length algorithm."
]
},
{
"cell_type": "code",
"execution_count": 140,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def dp_lcslen(seq1, seq2, return_lcs=False):\n",
" \"\"\"Length of LCS using a dynamic program.\"\"\"\n",
"\n",
" # Initialize matrix of substring LCSs.\n",
" lcs_mat = np.zeros((len(seq1)+ 1, len(seq2) + 1), dtype=int)\n",
" \n",
" # Build matrix, left to right, row by row\n",
" for i in range(1, len(seq1)+1):\n",
" for j in range(1, len(seq2)+1):\n",
" if seq1[i-1] == seq2[j-1]:\n",
" lcs_mat[i, j] = lcs_mat[i-1, j-1] + 1\n",
" else:\n",
" lcs_mat[i, j] = max(lcs_mat[i-1, j], lcs_mat[i, j-1])\n",
" \n",
" if return_lcs:\n",
" lcs = ''\n",
" i = len(seq1)\n",
" j = len(seq2)\n",
" while i != 0 and j != 0:\n",
" if lcs_mat[i,j] == lcs_mat[i-1, j]:\n",
" i -= 1\n",
" elif lcs_mat[i, j] == lcs_mat[i, j-1]:\n",
" j -= 1\n",
" else:\n",
" lcs = seq1[i-1] + lcs\n",
" i -= 1\n",
" j -= 1\n",
" return lcs_mat[-1, -1], lcs\n",
" else:\n",
" return lcs_mat[-1,-1]\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's give it a whirl!"
]
},
{
"cell_type": "code",
"execution_count": 145,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"(6, 'AKFEDC')"
]
},
"execution_count": 145,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dp_lcslen(seq1, seq2, True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The dynamic program only returns one of the LCSs, since we did not enumerate them all."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Examples of dynamic programs\n",
"This style of building up a solution to a big problem from smaller subproblems is called **dynamic programming**. It is pervasive in biology. Here are some examples of problems that use dynamic programming.\n",
"\n",
"* Global alignment (Needleman-Wunsch)\n",
"* Local alignment (Smith-Waterman)\n",
"* Some multiple sequence alignment algorithms (though most use other methods)\n",
"* RNA structure prediction\n",
"* Structural alignment (such as SSAP)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## A word of warning about algorithmic development\n",
"In LCS example, we found that practical calculations the LCS length were impossible with our first algorithm because of time and storage limitations. However, the algorithm *works*. In general, when you are writing code, first get something that works. That is, something that passes your unit tests and does what you want it to do. *Then* decide if it needs optimization. It is quite possible that a \"naive\" algorithm might be just find for your purposes. If this is the case, don't optimize. I think it's important to remember what Donald Knuth, one of the great computer scientists of all time, said."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"

\"Premature optimization is the root of all evil.\" --Donald Knuth\n",
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This goes also for what language you write your code in. Coding something up in Python or some other high level interpreted language is typically much easier than in a compiled language like C or Fortran. You code is also more portable. Only when you really need the speed should you go for one of those language."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.4.3"
}
},
"nbformat": 4,
"nbformat_minor": 0
}