During my Ph.D., I conducted research into the combinatorics of genome rearrangements.
As a teaching-track faculty member, I still seek opportunities to publish surrounding computational biology education, whether this is a bestselling textbook in bioinformatics algorithms or articles about experiences in education.
While we can read a book one letter at a time, biologists still lack the ability to read a DNA sequence one nucleotide at a time. Instead, they can identify short fragments (approximately 100 nucleotides long) called reads; however, they do not know where these reads are located within the genome. Thus, assembling a genome from reads is like putting together a giant puzzle with a billion pieces, a formidable mathematical problem. We introduce some of the fascinating history underlying both the mathematical and the biological sides of DNA sequencing.
We consider four families of pancake graphs, which are Cayley graphs, whose vertex sets are either the symmetric group on n objects or the hyperoctahedral group on n objects and whose generating sets are either all reversals or all reversals inverting the first k elements (called prefix reversals). We find that the girth of each family of pancake graphs remains constant after some small threshold value of n.
The introduction of the double cut and join (DCJ) operation and the derivation of its associated distance caused a flurry of research into the study of multichromosomal rearrangements. However, little of this work has incorporated indels (i.e., insertions and deletions) into the calculation of genomic distance functions, with a particular exception of Braga et al., who provided a linear time algorithm ([1]) for computing the DCJ-indel distance. Although this algorithm only takes linear time, its derivation is lengthy and depends on a large number of possible cases. In this paper, we provide a simplified indel model that solves the problem of DCJ-indel sorting in linear time directly from the classical breakpoint graph, an approach that allows us to describe the solution space of DCJ-indel sorting, thus resolving an existing open problem.
The introduction of the double cut and join operation (DCJ) caused a flurry of research into the study of multichromosomal rearrangements. However, little of this work has incorporated indels (i.e., insertions and deletions of chromosomes and chromosomal intervals) into the calculation of genomic distance functions, with the exception of Braga et al., who provided a linear time algorithm for the problem of DCJ-indel sorting. Although their algorithm only takes linear time, its derivation is lengthy and depends on a large number of possible cases.
We note the simple idea that a deletion of a chromosomal interval can be viewed as a DCJ that creates a new circular chromosome. This framework will allow us to amortize indels as DCJs, which in turn permits the application of the classical breakpoint graph to obtain a simplified indel model that still solves the problem of DCJ-indel sorting in linear time via a more concise formulation that relies on the simpler problem of DCJ sorting. Furthermore, we can extend this result to fully characterize the solution space of DCJ-indel sorting.
Encoding indels as DCJ operations offers a new insight into why the problem of DCJ-indel sorting is not ultimately any more difficult than that of sorting by DCJs alone. There is still room for research in this area, most notably the problem of sorting when the cost of indels is allowed to vary with respect to the cost of a DCJ and we demand a minimum cost transformation of one genome into another.
The double-cut-and-join operation (DCJ) is a fundamental graph operation that is used to model a variety of genome rearrangements. However, DCJs are only useful when comparing genomes with equal (or nearly equal) gene content. One obvious extension of the DCJ framework supplements DCJs with insertions and deletions of chromosomes and chromosomal intervals, which implies a model in which DCJs receive unit cost, whereas insertions and deletions receive a nonnegative cost of ω. This paper proposes a unified model finding a minimum-cost transformation of one genome (with circular chromosomes) into another genome for any value of ω. In the process, it resolves the open case ω > 1.
Online science education needs a new revolution.
In a flipped classroom, students complete automated modules to replace a traditional lecture, allowing the time devoted for the lecture to be devoted to constructive tasks reinforcing student knowledge. Yet although the flipped classroom offers a compelling approach for fostering a constructivist, student-centric learning environment, research on the efficacy of flipped classes has been mixed. For that matter, is it possible to successfully flip a classroom in an advanced, heavily specialized course like a bioinformatics algorithms course? Over the past several years, the author has implemented a flipped version of such a course and will discuss some of the successes and pitfalls encountered.
This textbook has been adopted by over 100 institutions around the world. An interactive version powers the authors’ popular online courses on Coursera.
Much of the fundamental work on the problem of assembling genomes from short reads is based upon a structure called a de Bruijn graph and finding an Eulerian path in this graph. In this review paper, we describe this approach and detail how it has been applied to genome assembly algorithms.