Exercise 2.2: Restriction enzyme cut sites


Restriction enzymes cut DNA at specific locations called restriction sites. The sequence at a restriction site is called a recognition sequence. Here are the recognition sequences of some commonly used restriction enzymes.

Restriction enzyme

Recognition sequence

HindIII

AAGCTT

EcoRI

GAATTC

KpnI

GGTACC

a) New England Biosystems sells purified DNA of the genome of λ-phage, a bacteriophage that infect E. coli. You can download the FASTA file containing the sequence here. Use the function you wrote in Exercise 2.1 to extract the sequence.

b) Write a function with call signature

restriction_sites(seq, recog_seq)

that takes as arguments a sequence and the recognition sequence of a restriction enzyme sites and returns the indices of the first base of each of the restriction sites in the sequence. Use this function to find the indices of the restriction sites of λ-DNA for HindIII, EcoRI, and KpnI. Compare your results with those given here, which contain a comprehensive list of locations of restriction sites for a variety of enzymes.