Eukaryotic Comparative Genomics

Similar documents
Eukaryotic Comparative Genomics

Innovations and Developments in Yeast. Karen Fortmann, Ph.D. Senior Research Scientist

MUMmer 2.0. Original implementation required large amounts of memory

Supporting Information

Supplemental Data. Jeong et al. (2012). Plant Cell /tpc

Pevzner P., Tesler G. PNAS 2003;100: Copyright 2003, The National Academy of Sciences

Genome-wide identification and characterization of mirnas responsive to Verticillium longisporum infection in Brassica napus by deep sequencing

SUPPLEMENTARY INFORMATION

After your yearly checkup, the doctor has bad news and good news.

How yeast strain selection can influence wine characteristics and flavors in Marquette, Frontenac, Frontenac gris, and La Crescent

The Wild Bean Population: Estimating Population Size Using the Mark and Recapture Method

Institute of Brewing and Distilling

GROWTH TEMPERATURES AND ELECTROPHORETIC KARYOTYPING AS TOOLS FOR PRACTICAL DISCRIMINATION OF SACCHAROMYCES BAYANUS AND SACCHAROMYCES CEREVISIAE

Business Statistics /82 Spring 2011 Booth School of Business The University of Chicago Final Exam

Efficient Image Search and Identification: The Making of WINE-O.AI

REPORT OF THE WORKING GROUP ON TEA BREW BY T C CHAUDHURI N MURALEEDHARAN ANOOP KUMAR BAROOAH

RESOLUTION OIV-OENO MOLECULAR TOOLS FOR IDENTIFICATION OF SACCHAROMYCES CEREVISIAE WINE YEAST AND OTHER YEAST SPECIES RELATED TO WINEMAKING

Wine-Tasting by Numbers: Using Binary Logistic Regression to Reveal the Preferences of Experts

STA Module 6 The Normal Distribution

STA Module 6 The Normal Distribution. Learning Objectives. Examples of Normal Curves

Supplemental Data. Ginglinger et al. Plant Cell. (2013) /tpc

Temple Frieze from Iraq 2500 BCE. Outline. Evolution of Lactase Persistence. Domesticated Cattle. Prehistory of dairying

Classification Lab (Jelli bellicus) Lab; SB3 b,c

Yeast and Homebrewing. Jasper Akerboom

Reasons for the study

Schatzlab Research Projects Michael Schatz. Oct 16, 2013 Research Topics in Biology, WSBS

Genome-wide comparative analysis of NBS-encoding genes between Brassica species and Arabidopsis thaliana

Genome-wide identification and characterization of simple sequence repeat loci in grape phylloxera, Daktulosphaira vitifoliae

GLOSSARY Last Updated: 10/17/ KL. Terms and Definitions

Construction of a Wine Yeast Genome Deletion Library (WYGDL)

ORIGINAL ARTICLE. G.V. de Melo Pereira, C.L. Ramos, C. Galvão, E. Souza Dias and R.F. Schwan. Abstract

Elemental Analysis of Yixing Tea Pots by Laser Excited Atomic. Fluorescence of Desorbed Plumes (PLEAF) Bruno Y. Cai * and N.H. Cheung Dec.

The Effect of ph on the Growth (Alcoholic Fermentation) of Yeast. Andres Avila, et al School name, City, State April 9, 2015.

National Academy of Agricultural Science, Rural Development Administration, Suwon , South Korea e

Objective: Decompose a liter to reason about the size of 1 liter, 100 milliliters, 10 milliliters, and 1 milliliter.

of Vitis vinifera using

Surface-Mounted Thermostat ATH series

Comparative Analysis of Fresh and Dried Fish Consumption in Ondo State, Nigeria

Deciphering the microbiota of Greek table olives - A metagenomics approach

SNP discovery from amphidiploid species and transferability across the Brassicaceae

The human colonisation of the Pacific: Process and Impact

wine 1 wine 2 wine 3 person person person person person

Use of WL Medium to Profile Native Flora Fermentations

Clause 1. Clause 2 Clause 3. Clause FDA, MOPH.

Imputation of multivariate continuous data with non-ignorable missingness

Real-time PCR beer screening

Missing Data Treatments

Pitfalls for the Construction of a Welfare Indicator: An Experimental Analysis of the Better Life Index

The multivariate piecewise linear growth model for ZHeight and zbmi can be expressed as:

Introduction to Management Science Midterm Exam October 29, 2002

Effect of N-fixation on nitrous oxide emissions in mature caragana shelterbelts

STRUCTURES OF PURINES. Uric acid

Genetic diversity of wild Coffee (Coffea arabica) and its implication for conservation

Greenhouse Effect Investigating Global Warming

Why PAM Works. An In-Depth Look at Scoring Matrices and Algorithms. Michael Darling Nazareth College. The Origin: Sequence Alignment

National 5 ADDITIONAL QUESTION BANK You have chosen to study: Statistics. Please choose a question to attempt from the following: Back to Unit 2 Menu

ESTIMATING ANIMAL POPULATIONS ACTIVITY

VQA Ontario. Quality Assurance Processes - Tasting

Value Alignment. Michele Morehouse. University of Phoenix BUS/475. Scott Romeo

Yeast. Jasper Akerboom Lost Rhino Brewing Company

Statistics 5303 Final Exam December 20, 2010 Gary W. Oehlert NAME ID#

RESOLUTION OIV-OENO MONOGRAPH ON GLUTATHIONE

UT igem 2012: Caffeinated coli. h7p://2012.igem.org/team:ausan_texas

Product Consistency Comparison Study: Continuous Mixing & Batch Mixing

RESOLUTION OIV-OENO 576A-2017

Green Beans, the Wonderful Fruit Using Scientific Measurement

Handling Missing Data. Ashley Parker EDU 7312

Cetacean habitat distribution in the eastern Bering Sea

Apport de la Cytogénétique Moléculaire. àl analyse du Génome de la Canne à sucre

Yeast Hybrids in Winemaking

Mischa Bassett F&N 453. Individual Project. Effect of Various Butters on the Physical Properties of Biscuits. November 20, 2006

Eulachon (Thaleichthys pacificus) Spawning Stock Biomass (SSB) for the Cowlitz River, Nathan Reynolds Ecologist, Cowlitz Indian Tribe

Detecting Melamine Adulteration in Milk Powder

1. Determine methods that can be used to form curds and whey from milk. 2. Explain the Law of Conservation of Mass using quantitative observations.

Mastering Measurements

Biological Molecules Question Paper 4

Yeast Hybrids in Winemaking

WP Board 1054/08 Rev. 1

Entry Level Assessment Blueprint Retail Commercial Baking

Mem. Faculty. B. O. S. T. Kindai University No. 38 : 1 10 (2016)

Temperature Adaptation Markedly Determines Evolution within the Genus Saccharomyces

1. Simplify the following expression completely, leaving no exponents remaining.

Previous analysis of Syrah

AGREEMENT n LLP-LDV-TOI-10-IT-538 UNITS FRAMEWORK ABOUT THE MAITRE QUALIFICATION

Recent Developments in Coffee Roasting Technology

Growth in early yyears: statistical and clinical insights

Research Background: Weedy radish is considered one of the world s

Curtis Miller MATH 3080 Final Project pg. 1. The first question asks for an analysis on car data. The data was collected from the Kelly

Understanding yeast to prevent hydrogen sulfide (H 2 S) in wine. Enlightened science Empowered artistry. Matthew Dahabieh, PhD

Measuring economic value of whale conservation

COMPARISON OF CORE AND PEEL SAMPLING METHODS FOR DRY MATTER MEASUREMENT IN HASS AVOCADO FRUIT

Surface-Mounted Thermostat ATH series

Title: Development of Simple Sequence Repeat DNA markers for Muscadine Grape Cultivar Identification.

Mapping the distinctive aroma of "wild strawberry" using a Fragariavesca NIL collection. María Urrutia JL Rambla, Antonio Granell

Comparing R print-outs from LM, GLM, LMM and GLMM

Preferred citation style

Decarboxylation of Sorbic Acid by Spoilage Yeasts Is Associated with the PAD1 Gene

Samples: Standard solutions of rutin, quercetina, rosmarinic acid, caffeic acid and gallic acid. Commercial teas: Green, Ceilan, Hornimans and Black.

Activity Sheet Chapter 6, Lesson 6 Using Chemical Change to Identify an Unknown

Effective and efficient ways to measure. impurities in flour used in bread making

Transcription:

Eukaryotic Comparative Genomics

Detecting Conserved Sequences Charles Darwin Motoo Kimura

Evolution of Neutral DNA A A T C TA AT T G CT G T GA T T C A GA G T A G CA G T GA AT A GT C T T T GA T GT T G T T GC A G GA GT A GT C G T A * * * * * * * * * * * * * * * * * * * * * * * * *

Evolution of Non-Neutral DNA A CT T AG T C CG A T G T G CG T A C C G A C C A T A AG G A TG AC C A * C GT A T AC C A T G T G G T A TC C G AT C C A T A A G CA T A CT * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Multi-Species Alignment ATGTGGCGCAGCCTGTGCCAGCTGGACGATCGA ATGTAGCCTAGCCAGTGCCAGCTGGACGATCGA GTACATCGATAGCTTAGAATGCTGGACGATCTC GTACGTCGATAGCATAGAATGCTGGACGATCTC * * * * ***********

How to do Comparative Genomics 1. Choose species to analyze 2. Align sequences 3. Identify streches of highly conserved nucleotides

Choose species closely related species distantly related species Closely Related Species align well not many changes Distantly Related Species hard to align lots of changes

~10Mya ~20Mya S.cerevisiae S. cariocanus S. paradoxus S. mikatae S. kudriavzevii S. bayanus S. pastorianus S. servazzii S. unisporus S. exiguus S. diarenensis S. castellii S. kluyveri ~150Mya >350Mya Kluyveromyces lactis Schizosaccharomyces pombe

Case Study: Coding vs.non-coding ATG. ORF TAA Non-Coding DNA -regulatory functions -short (5-15 bp) -degenerate -variable spacing Coding DNA -codes for protein -triplet code -open reading frame (ORF) -tend to be long (50-500 bp) -highly constrained

CASE 1: Non-Coding ATG GAL4 TAA

~10Mya ~20Mya S.cerevisiae S. cariocanus S. paradoxus S. mikatae S. kudriavzevii S. bayanus S. pastorianus S. servazzii S. unisporus S. exiguus S. diarenensis S. castellii S. kluyveri ~150Mya >350Mya Kluyveromyces lactis Schizosaccharomyces pombe

Closely-related sequences are uninformative ATG GAL4 paradoxus TCTTCTGAGACAGCATCACTTCTTCTTNTTTTTTACATAACTTATTCTTCTATAATTTTC cerevisiae TCCTTTGAGACAGCATTCGCCCAGTATTTTTTTTATTCTACA-AACCTTCTATAATTT-C ** * *********** * * ******* ** * ************ * paradoxus AACGTATTTACATAGTTCTGTATCAGTTTAATCACCATAATATTGTTTTCCCTCAACTAA cerevisiae AAAGTATTTACATAATTCTGTATCAGTTTAATCACCATAATATCGTTTTCT-----TTGT ** *********** **************************** ****** * paradoxus TGAATGCAATTAGATTTTCTTATTGTTCCCTCGCGGCTTTTTTTTGTTTTATAATCTATT cerevisiae TTAGTGCAATTAATTTTTCCTATTGTTACTTCG-GGCCTTTTTCTGTTTTATGAGCTATT * * ******** ***** ******* * *** *** ***** ******** * ***** paradoxus TTTTCCGTCATTTCTTCCCCAGATTTCCAACTTCATCTCCAGATTGTGTCTATGTAATGC cerevisiae TTTTCCGTCATC-CTTCCCCAGATTTTCAGCTTCATCTCCAGATTGTGTCTACGTAATGC *********** ************* ** ********************** ******* paradoxus ATGCTATCATATTGAGAAAAGATAGAGAAACAACCCTCCTGAAAAATGAAGCTACTGTCT cerevisiae ACGCCATCATTTTAAGAGAGGACAGAGAAGCAAGCCTCCTGAAAGATGAAGCTACTGTCT * ** ***** ** *** * ** ****** *** ********** ***************

~10Mya ~20Mya S.cerevisiae S. cariocanus S. paradoxus S. mikatae S. kudriavzevii S. bayanus S. pastorianus S. servazzii S. unisporus S. exiguus S. diarenensis S. castellii S. kluyveri ~150Mya >350Mya Kluyveromyces lactis Schizosaccharomyces pombe

Distantly-related sequences do not align ATG GAL4 Noncoding (Promoter) cerevisiae ACTTACCAT-CAAC-CATAGATGGGTAAAC---GGTTAGTAACTAGGAACACGAT castelli AGA-GTCAAACTTTTCGT ATA--TATATATAATATGTCTGATTGCTGGTT---T * ** * * * * * * * * *

~10Mya ~20Mya S.cerevisiae S. cariocanus S. paradoxus S. mikatae S. kudriavzevii S. bayanus S. pastorianus S. servazzii S. unisporus S. exiguus S. diarenensis S. castellii S. kluyveri ~150Mya >350Mya Kluyveromyces lactis Schizosaccharomyces pombe

Multiple sequence alignments reveal conserved elements cerevisiae TGAGACAGCAT-CACTTCTT-CTTNTTTTTTACATAACTTATTCTTCTATAATTTTCAAC mikatae TGAGACAGCATTCACTTCTTTCTTTTTTTTTACATATCTTATTCTTCTATAATTTTCAAC Bayanus TGAGACAGCATTCGCCCAGT--ATTTTTTTTAT-TCTACAAACCTTCTATAATTT-CAAA kudriadzevi TGAGACTGCACTCCC--------TCTTCCTTTC------------TCCATAACTT---AC ****** *** * * * ** ** ** **** ** * paradoxus kluyveri cerevisiae bayanus UAS1 ATG UAS2 GAL4 GTATTTACATAGTTCTGTATCAGTTTAATCACCATAAT------ATTGTTTTCCCTCAAC GTATTTACATAGTTCTGTATCAGTTTAATCACCATAAT------ATTGTTTTCCCTCAAC GTATTTACATAATTCTGTATCAGTTTAATCACCATAAT------ATCGTTTTCTTTGT-- TTATTTACATAGTTTTGTATCAGTTTAATCACCATAATCGTAACACCGTTTTACCTCACC ********** ** *********************** * ***** * paradoxus kluyveri cerevisiae bayanus paradoxus kluyveri cerevisiae bayanus paradoxus kluyveri cerevisiae bayanus TAATGAATGCAATTAGATTTTC-TTATTGTTCCC-TCGCGGCTTTTTTTTGTTTTATAAT TAATGAATGCAATTAGATTTTCCTTATTGTTCCCCTCGCGGCTTTTTTTTGTTTTATAAT ---TTAGTGCAATTAATTTTTC-CTATTGTTACT-TCG-GGCCTTTTTCTGTTTTATGAG TGATGCGGG--A---ATCCTTC-AGACCGTTCTC-TCGCGC------------------- * * * *** * *** *** * UES MIG1 MIG1 -CTATTTTTTCCGTCATTTCTTCCCC-AGATTTCCAACTTCAT-CTCCAGATTGTGTCTA ACTATTTTTTCCGTCATTTCTTCCCCCAGATTTCCAACTTCATACTCCAGATTGTGTCTA -CTATTTTTTCCGTCATC-CTTCCCC-AGATTTTCAGCTTCAT-CTCCAGATTGTGTCTA -CTTTTTTTTTCGTCATTTCTTCCCC-AGATCTACAACTTTAA-CTCCAGACGGTGTATA ** ****** ****** ******* **** * ** *** * ******* **** ** TGTAATGCATGCTATCATATTGAGAAAAGATAGAGAAACAACCCTCCTGAAAAATGAAGC TGTAATGCATGCTATCATATTGAGAAAAGATAGAGAAACAACCCTCCTGAAAAATGAAGC CGTAATGCACGCCATCATTTTAAGAGAGGACAGAGAAGCAAGCCTCCTGAAAGATGAAGC GGCAGTACAAGCAGTGCTTTTGGGAAGAGGCAAAGCTGCAGACCTCGAGAACAATGAAGC * * * ** ** * * ** ** * * ** ** **** *** *******

CASE 2: Coding ATG CLN3 TAA

~10Mya ~20Mya S.cerevisiae S. cariocanus S. paradoxus S. mikatae S. kudriavzevii S. bayanus S. pastorianus S. servazzii S. unisporus S. exiguus S. diarenensis S. castellii S. kluyveri ~150Mya >350Mya Kluyveromyces lactis Schizosaccharomyces pombe

Closely-related sequences are uninformative

~10Mya ~20Mya S.cerevisiae S. cariocanus S. paradoxus S. mikatae S. kudriavzevii S. bayanus S. pastorianus S. servazzii S. unisporus S. exiguus S. diarenensis S. castellii S. kluyveri ~150Mya >350Mya Kluyveromyces lactis Schizosaccharomyces pombe

Less distantly related species not informative either

~10Mya ~20Mya S.cerevisiae S. cariocanus S. paradoxus S. mikatae S. kudriavzevii S. bayanus S. pastorianus S. servazzii S. unisporus S. exiguus S. diarenensis S. castellii S. kluyveri ~150Mya >350Mya Kluyveromyces lactis Schizosaccharomyces pombe

Distanly related species reveal functional protein domains

Identification of Multi-Species Conserved Regions (MCS) Human Chimp Mouse Rat Dog cccattcttttccaagtgtctccg--cctgcagcgattaggttagaaagcatttctctct cccattcttttccaagtgtctccg--cctgcagcgattaggttagaaagcatttctctct ttcagtcgtttcccagtgtctctga-cattcagagactactttagtaagcattt-tctct tcagtccttccctggcatctccag-cactcaa-gactactttagtaagcattt-tctctg tcaatgactttcccagtctcttctactgggaagagattaggttgcaaatcatttttctct * * * * * * ** How can we decide if this region in conserved? Margulies et al (2003) Gen. Res. 13:2507-18

Binomial-Based Method for Detection of MCS Human: AATGG Mouse: AATCG Status: CCCDC p = chance that a site is the same between human and mouse, q = 1-p For an alignment N base pairs long with n identities calculate the cumulative binomial probability as: P ( X n) N i n p i q N i N i Margulies et al (2003) Gen. Res. 13:2507-18

How to score human-mouse conservation? score M σ μ 1) Look at 50 bp windows that align 2) M is the number of identical bases in a particular 50 bp alignment 3) is the average number of identical residues in 50 bp alignments of local ancient, syntenic repeats (neutral) 4) is the standard deviation of Nature (2002) 420: 520-62

5% Conserved between Human-Mouse Red = neutral Blue = observed genomic Gray = estimated selection (20% of windows under selection)(25% of bp in alignments) = 5% Nature (2002) 420: 520-62

What does 5% conservation mean? Only 1.5% of the genome is coding sequence 5 UTRs, 3 UTRs, promoters, and introns do not make up the difference

Problem with resolution Answer: Sequence more genomes (maybe)! Eddy 2005: Binomial model for power calculations

Tree Topology Influences Power Star Phylogeny Actual Phylogeny species A species F species B species E species C species D

Ultraconserved Sequences 481 sequences longer than 200 bp are 100% identical between orthologous regions of human, mouse, and rat Most conserved at 99% in chicken and dog too 5000 sequences longer than 100 bp are 100% identical in these species Bejerano et al (2004) Science 304: 1321-1325

Olig2 100 Kb upstream of Olig2

So what do they do?