Rationale: MUMmer 2.0 Original implementation required large amounts of memory Advantages: Chromosome scale inversions in bacteria Large scale duplications in Arabidopsis Ancient human duplications when amino acid space explored >70% of human chr 14 derives from chr 2 10/29/13 0
Improvements Uses suffix trees for linear time and space solution but room for improvement Memory reduced from 293MB to 100MB using suffix tree improvements of Kurtz (20 bytes/ bp) Time down from 74s to 27s using streaming 10/29/13 1
Idea of algorithm We take a streaming string and run McCreight s algorithm to find where it would go. If it branches in a leaf edge, it is unique in the string in the suffix tree (reference) We then check the character immediately to the left in both strings for left maximality 10/29/13 2
Pros and cons Question: Are matches guaranteed to be unique in the query sequence(s)? Question: what are advantages of clustering/local alignment vs. global alignment of MUMmer 1.0? Question: Why use protein sequences instead of nucleotide sequences? 10/29/13 3
A mini quiz You are given two genomes that your biologist colleagues think have perfectly matching repeats. How would you find the length of the longest matching repeat and how much time would it take? 10/29/13 4
Suffix arrays Suffix arrays require even less space than a suffix tree Very simply, it is a sorted list of suffixes Example in class and in the Aluru chapter 10/29/13 5
Linear time of suffix arrays There were three papers in 2002 that solved the old problem of constructing suffix arrays in linear time. These were: Ko and Aluru very interesting, but hard to understand Kim et al. was based on older parallel suffix tree algorithms Karakkanen and Sanders is the simplest and most elegant. 10/29/13 6
10/29/13 7
Try it out Construct the suffix array of the string BANANA$ Construct the LCP array for the suffix array above Given the suffix array and LCP array, can you draw a suffix tree? 10/29/13 8
Search Look at your suffix array for the string BANANA. How would you quickly find the string NA in here, and how does it compare to a suffix tree? What are the pros and cons of these approaches? 10/29/13 9
Algorithm Recursively sort the 2/3n suffixes with i mod 3!= 0 Sort the 1/3n suffixes with i mod 3 == 0 using the previous result. Merge the two sorted arrays. 10/29/13 10
Some thoughts The sorting can be done using Radix sort and the relative ranks of suffixes used for the ordering. The 1/3 and 2/3 split makes the merging much easier; other ½ ½ approaches (e.g. Kim et al.) use this with clever tricks. Similar to the odd and even suffix technique of Farach. 10/29/13 11
Yeast paper Beer may have cemented human societies through social act, rituals, medicine and uncontaminated water Yeast, along with crops, may have also been domesticated 10/29/13 12
Background Brewing evolved in middle ages Europe to produce ale-type beer via Saccharomyces cerevisiae, the same yeast used in wine and leavened bread. Lager-brewing arose in 15 th century Bavaria, and is the most popular technique Lager, however, requires slow, low temperature fermentation by cryotolerant yeast(s). 10/29/13 13
Saccharomyces pastorianus Used to make lager, but never has been found in wild and depends on humans Allotetraploid hybrid of S. cerevisiae and an unknown yeast species. Understanding this unique contribution is important for understanding domestication of this yeast for human use 10/29/13 14
Results Saccharomyes are associated with oak trees in Northern hemisphere. This study focused on Patagonia in South America with 123 cryotolerant species and two isolates of S. cerevisiae. The fact so many were cryotolerant is unique relative to the northern hemisphere. These group with biological assays with the two known contaminants of lager/cider/wine fermentation 10/29/13 15
Species details Although species similar to known contaminants can create hybrids, this is relatively rare. As other complexes, they look exactly alike indicating a potential common origin. Even so, they occupy different species of trees in different habitats. 10/29/13 16
Genome sequencing Relationships are contentious as the lager yeast and related yeasts previously were only found in human fermentation efforts. To address this issue, the authors sequenced representatives from Patagonia and breweries using short read/ next gen technology. Comparisons were done to inform the biology here. 10/29/13 17
Domestication and analysis Lager yeast is a mix of at least three yeast species Interestingly, all cryotolerant species have the same chunk of S. cervisiae useful for processing maltose Maltose is one of the most abundant sugars in wort used in brewing Fusion seems to have happened at least twice (see optional paper on course site) 10/29/13 18
Lager paper Three cool facts when you get a chance to read Yeast used for lager beer probably arose in ale breweries Two distinct types of lager yeast, referred to as groups 1 and 2 Both groups probably arose independently in Europe