Great Ideas in Computational Biology 2019 class
Students in our inaugural class in spring 2019.
Great Ideas in Computational Biology 2024 class
We’ve grown! Our class in spring 2024.

About Great Ideas in Computational Biology

Great ideas in computational biology (02-251) is a 12-unit course offered to students at Carnegie Mellon University who are interested in an introduction to the field of computational biology. It is taken by students of all years of study, but it is aimed at School of Computer Science first-year students who are interested in the computational biology major. I am unaware of a computationally rigorous introduction to computational biology students for first-year undergraduates at any other institution.

The course was taught to its first cohort in spring 2019 as a joint project with Carl Kingsford. I have taught the course as a solo project since that time, making lots of changes to the subjects taught in response to what students have reported particularly enjoying, and after consulting our faculty in the Computational Biology Department.

In spring 2021, I tried something a bit different by incentivizing course participation with donations to charity. The COVID-19 pandemic has meant that students find it difficult to engage in their courses, and I wrote about my efforts to reward them for doing so in my course.

In 2022, I won the Herbert A. Simon Award for Teaching Excellence in Computer Science for my work in teaching this course. This award is the top teaching honor bestowed by Carnegie Mellon’s School of Computer Science.

Student Testimonials

“Compeau is a legend. I disliked biology before I took his class and still do a little, but he made me fall in love with compbio this semester. This is one of the best classes for students looking to explore the applications of CS on other fields.”

“Professor Compeau spends a lot of time structuring his class towards a computational perspective to encourage computer scientists to delve into computational biology. He provides relevant biological information in a clear and concise way to ensure that students without a biology background can be on equal grounds with those who have extensive ones.”

“Incredible class and amazing lectures! This is definitely the best class I’ve taken at CMU.”

“The best Professor that I have ever met in my whole college life.”

“This class is just good stuff. Thank you for being an amazing professor who has changed my perspective on computer science in a positive way. I cant wait to delve more into Computational Biology after this course :)”

“This is a class that made me glad I chose to come to CMU. The class gives you a great taste of so much going on in computational biology – all accessible for students without any prior biology experience. Dr. Compeau is second to none. He is one of the most passionate professors I have ever met… Dr. Compeau was clearly rooting for our success not trying to break us! Studying remotely with a 16 hour time difference in Australia, I did not have any troubles getting the help I needed via Piazza. Seamless! Dr. Compeau changes the syllabus as computational biology advances – we discussed even advancements done in the past few months. There were several COVID assignments that gave detailed breakdowns and walked students through how researchers would have begun to study CoV2! Dr. Compeau also managed to get us some incredible guest lectures which I personally found fascinating! Overall, this class has had an incredible impact on my CMU career and recommend it to all!”

“Loved this course. I think its a shame that not everyone takes it.”

“Amazing class! I really feel that I learned a lot in terms of knowledge. The classroom environment is always very active and I’m impressed by how others think.”

“This is the best course I’ve taken at CMU thus far. It’s one of those classes that are hard enough to keep you engaged but not so hard that you feel hopeless, and the work you do feels meaningful and you understand why each piece exists, and how it helps you learn. The lectures are also amazing and I really liked them and showing up was very worth it. Also had the most interesting content of all the courses I’ve taken. 10/10 would recommend.”

Great Ideas in Computational Biology Curriculum

The first half of the course provides a broad overview of topics in fundamental bioinformatics algorithms. Some of that material is adapted from my Bioinformatics Algorithms project.

The second half of the course samples beautiful ideas from a variety of different areas, taking a broad view of computational biology as the field continues to evolve. Some of these areas include biological network analysis, cell and systems modeling, DNA computing, automated science, and algorithms in nature.

I am providing the week-by-week lecture slides in PDF format below as a public resource. Some topics, such as how the fundamental algorithms miniasm, Clustal, and BLAST work, are presented as mini-lectures as part of the course recitations. If you are interested in these materials, or Bioinformatics Algorithms, please reach out to me.

Week 1: Assembling genomes

Click here to open slides in a new tab. Great ideas covered:

  • de Bruijn graphs for genome assembly
Genome_Assembly

Week 2: Finding hidden messages in DNA

Click here to open slides in a new tab. Great ideas covered:

  • skew diagrams for locating replication origins in bacterial genomes
  • Gibbs sampling and expectation maximization algorithms for finding motifs in transcription factor binding sites
Hidden_Messages

Weeks 3-4: Sequence alignment

Click here to open slides in a new tab. Great ideas covered:

  • Needleman-Wunsch algorithm for global sequence alignment
  • Smith-Waterman algorithm for local sequence alignment
  • affine sequence alignment
  • hidden Markov models for multiple sequence alignment of variable sequences
Sequence_Alignment

Week 5: Evolutionary trees

Click here to open slides in a new tab. Great ideas covered:

  • UPGMA
  • neighbor-joining algorithm
  • Fitch algorithm for inferring ancestral states in a rooted tree
Evolutionary_Trees

Week 6: Read mapping

Click here to open slides in a new tab. Great ideas covered:

  • Suffix arrays
  • Suffix trees
  • The Burrows-Wheeler transform
Read_Mapping

Week 7: RNA-Sequencing

Click here to open slides in a new tab. Great ideas covered:

  • Adapting Burrows-Wheeler based read mapping to the problem of RNA-sequencing
  • Spliced alignment to find splice junctions
  • RNA transcript assembly
  • Expectation maximization to quantify transcript abundances
  • Differential expression analysis
RNA-Sequencing

Week 8: Proteins

Click here to open slides in a new tab. Great ideas covered:

  • ab initio and homology approaches for protein structure prediction
  • Comparing protein structures globally using RMSD and the Kabsch algorithm
  • Comparing protein structures locally with contact maps and Qres
  • Peptide sequencing and peptide identification
Proteins

Week 9: Systems biology

Click here to open slides in a new tab. Great ideas covered:

  • Motifs in transcription factor networks, including negative autoregulation and feed-forward motifs.
  • Particle-based reaction-diffusion models.
  • The repressilator motif and biological oscillators
  • Gillespie’s stochastic simulation algorithm for simulating chemical reactions in a well-mixed environment, applied to bacterial chemotaxis.
Systems-Biology

Week 10: Neural networks and the evolution of modularity

Click here to open slides in a new tab. Great Ideas covered:

  • McCulloch-Pitts neurons.
  • Perceptrons and linear separability.
  • Encoding logical propositions as networks of perceptrons.
  • The universality of NAND (and therefore networks of perceptrons) for representing any binary function.
  • The Alon-Kashtan algorithm demonstrating spontaneous evolution of modularity in a biological model.
  • A ten-minute overview of deep learning.
Neural-Networks-and-Modularity

Week 11: Algorithms in nature

This week’s material is a little atypical for an introductory course in computational biology. It centers on the theme of algorithms implemented within nature, whether that is a bacterium, an insect, or a slime mold, that are used to solve problems heuristically. These algorithms are often distributed and based on probability, so that they are outside the realm of what students typically see in introductory computer science. Some of the problems that the algorithms are “solving” are in fact fundamental CS problems, and describing these algorithms led to surprising new contributions to computer science.

Special thanks in this section to Saket Navlakha, who provided some excellent advice on the most elegant ideas from this field to profile.

Click here to open slides in a new tab. Great ideas covered:

  • E. coli‘s chemotaxis exploration algorithm.
  • Ant foraging algorithms.
  • Slime mold transportation networks.
  • A distributed heuristic for solving the maximal independent set problem implemented by Drosophila for choosing sensory organ precursor cells (SOPs) during development.
  • A probabilistic approach based on Bloom filters and neural networks for solving the novel query problem, based on the Drosophila olfactory system.
Algorithms-in-Nature

Week 12: A sampling of “mini-great ideas” in computational biology

Click here to open slides in a new tab. Great ideas covered:

  • Genome rearrangements and the fragile breakage model of genomes
  • Cellular automata (Game of Life and the self-replicating Langton loops)
  • Spatial game theory
  • Turing patterns and the Gray-Scott model
Mini-Great-Ideas

Great Ideas in Computational Biology Assessment Structure

Students taking Great Ideas in Computational Biology complete both theoretical and programming homework assignments. Starting in 2021, students also complete a collection of assignments that we developed to guide them through using existing open software to answer real research questions about SARS-CoV-2, and which I am providing to the community.

Finally, my favorite part about the course is that students all complete a project on applying computational analysis to a biological dataset of their own choosing. Students are required to write an essay detailing their work as well as deliver a short presentation to their peers. The projects that students produce are exceptional. Among a very strong group, I have chosen the following projects (with student permission) as stand-out examples of excellent essays for our course “ring of honor” shown below. These essays are not perfect, but they exemplify the superlative work that first year undergraduates can complete.

Great Ideas in Computational Biology Student Project Ring of Honor

Zahra Ahmad, “Identification of Differentially Expressed Genes between Immune Phenotypes in Breast Cancer”

Ahmad_Zahra

Viola Chen, “Investigation on difference in level of expression of cellular receptor for SARS-CoV-2, Angiotensin- converting Enzyme 2(ACE2), regarding age, gender and organ”

Chen_Viola

Shyam Sai, “Breast Cancer Diagnosis and Prognosis Using Keras and OpenCV”

Sai_Shyam

Eunseo Sung, “The Effects of Gene Expression on Pulmonary Adenocarcinoma Progression”

Sung_Eunseo

Meghana Tandon, “Quantifying Stability of Common [Metagenomics] Distance Metrics and Similarity Scores”

Tandon_Meghana

Priya Varra, “Classifying White Blood Cell Images Using Deep Learning”

Varra_Priya

Brian Zhang, “Avian Migration on the spread of Influenza A (H7N9) in China”

Zhang_Brian

Page Contents