Eukaryotic Comparative Genomics

Similar documents
Eukaryotic Comparative Genomics

Innovations and Developments in Yeast. Karen Fortmann, Ph.D. Senior Research Scientist

Pevzner P., Tesler G. PNAS 2003;100: Copyright 2003, The National Academy of Sciences

Supplemental Data. Jeong et al. (2012). Plant Cell /tpc

MUMmer 2.0. Original implementation required large amounts of memory

Construction of a Wine Yeast Genome Deletion Library (WYGDL)

Efficient Image Search and Identification: The Making of WINE-O.AI

Wine-Tasting by Numbers: Using Binary Logistic Regression to Reveal the Preferences of Experts

Genome-wide identification and characterization of mirnas responsive to Verticillium longisporum infection in Brassica napus by deep sequencing

Why PAM Works. An In-Depth Look at Scoring Matrices and Algorithms. Michael Darling Nazareth College. The Origin: Sequence Alignment

Schatzlab Research Projects Michael Schatz. Oct 16, 2013 Research Topics in Biology, WSBS

of Vitis vinifera using

Supporting Information

Institute of Brewing and Distilling

Temple Frieze from Iraq 2500 BCE. Outline. Evolution of Lactase Persistence. Domesticated Cattle. Prehistory of dairying

IT 403 Project Beer Advocate Analysis

RESOLUTION OIV-OENO MOLECULAR TOOLS FOR IDENTIFICATION OF SACCHAROMYCES CEREVISIAE WINE YEAST AND OTHER YEAST SPECIES RELATED TO WINEMAKING

Eulachon (Thaleichthys pacificus) Spawning Stock Biomass (SSB) for the Cowlitz River, Nathan Reynolds Ecologist, Cowlitz Indian Tribe

Objective: Decompose a liter to reason about the size of 1 liter, 100 milliliters, 10 milliliters, and 1 milliliter.

Value Alignment. Michele Morehouse. University of Phoenix BUS/475. Scott Romeo

The Roles of Social Media and Expert Reviews in the Market for High-End Goods: An Example Using Bordeaux and California Wines

Reasons for the study

SUPPLEMENTARY INFORMATION

The human colonisation of the Pacific: Process and Impact

GLOSSARY Last Updated: 10/17/ KL. Terms and Definitions

Identification and Classification of Pink Menoreh Durian (Durio Zibetinus Murr.) Based on Morphology and Molecular Markers

WINE GRAPE TRIAL REPORT

Entry Level Assessment Blueprint Retail Commercial Baking

Classification Lab (Jelli bellicus) Lab; SB3 b,c

Emerging Local Food Systems in the Caribbean and Southern USA July 6, 2014

Statistics 5303 Final Exam December 20, 2010 Gary W. Oehlert NAME ID#

Mem. Faculty. B. O. S. T. Kindai University No. 38 : 1 10 (2016)

Comparative Analysis of Fresh and Dried Fish Consumption in Ondo State, Nigeria

Sebec Lake Association Meeting - 7/11/15

Title: Development of Simple Sequence Repeat DNA markers for Muscadine Grape Cultivar Identification.

Research Background: Weedy radish is considered one of the world s

After your yearly checkup, the doctor has bad news and good news.

Business Statistics /82 Spring 2011 Booth School of Business The University of Chicago Final Exam

is pleased to introduce the 2017 Scholarship Recipients

GROWTH TEMPERATURES AND ELECTROPHORETIC KARYOTYPING AS TOOLS FOR PRACTICAL DISCRIMINATION OF SACCHAROMYCES BAYANUS AND SACCHAROMYCES CEREVISIAE

IMSI Annual Business Meeting Amherst, Massachusetts October 26, 2008

The Definitive Guide to Crushing ICOs

Labor Supply of Married Couples in the Formal and Informal Sectors in Thailand

A Note on a Test for the Sum of Ranksums*

Algorithms in Percolation. Problem: how to identify and measure cluster size distribution

First Report of Pierce s Disease in New Mexico

WP Board 1054/08 Rev. 1

A Computational analysis on Lectin and Histone H1 protein of different pulse species as well as comparative study with rice for balanced diet

Testing Taste. FRAMEWORK I. Scientific and Engineering Practices 1,3,4,6,7,8 II. Cross-Cutting Concepts III. Physical Sciences

PHENOLOGY IN SLOVENIA Whatis new?

Vegan Ice Cream with Similar Nutritional Value to Dairy-based Ice Cream

Decision making with incomplete information Some new developments. Rudolf Vetschera University of Vienna. Tamkang University May 15, 2017

COOKING HAMBURGERS ON A WEBER OUTDOOR GRILL O. Peter Snyder, Jr., Ph.D.

Phylogenetic Analysis of Chloroplast DNA Variation in Coffea L.

wine 1 wine 2 wine 3 person person person person person

Alcohol Meter for Wine. Alcolyzer Wine

AGREEMENT n LLP-LDV-TOI-10-IT-538 UNITS FRAMEWORK ABOUT THE MAITRE QUALIFICATION

Genetic diversity of wild Coffee (Coffea arabica) and its implication for conservation

SPONGE CAKE APPLICATION RESEARCH COMPARING THE FUNCTIONALITY OF EGGS TO EGG REPLACERS IN SPONGE CAKE FORMULATIONS RESEARCH SUMMARY

GENETICS AND EVOLUTION OF CORN. This activity previews basic concepts of inheritance and how species change over time.

ILSI Workshop on Food Allergy: From Thresholds to Action Levels. The Regulators perspective

Elemental Analysis of Yixing Tea Pots by Laser Excited Atomic. Fluorescence of Desorbed Plumes (PLEAF) Bruno Y. Cai * and N.H. Cheung Dec.

VQA Ontario. Quality Assurance Processes - Tasting

An Investigation into the relative gluten content of wheat flours

Molecular Systematics & Ethnobotany Case Study: Breadfruit

As described in the test schedule the wines were stored in the following container types:

Molecular Systematics & Ethnobotany Case Study: Breadfruit

EVALUATION OF THE CHLROPLAST DNA AMONG VICIA FABA L. GERMPLASM USING RESTRICTION- SITE ANALYSIS *

Where in the Genome is the Flax b1 Locus?

Predicting Wine Quality

Hydrolyzed & plant-based formulas

Chauvet Cave v=79luyqwznh4. Sunday, May 15, 2011

Gray Flycatcher Empidonax wrightii

Genome-wide identification and characterization of simple sequence repeat loci in grape phylloxera, Daktulosphaira vitifoliae

Resident manager. The ticket to success set up for future of Dining in senior care

Understanding yeast to prevent hydrogen sulfide (H 2 S) in wine. Enlightened science Empowered artistry. Matthew Dahabieh, PhD

STRUCTURES OF PURINES. Uric acid

Quality of western Canadian flaxseed 2014

Modeling Regional Endogenous Growth

Names Date Pd. Mentos Investigation

Black Sheep Coffee. Case Study. Kerianne Gallag Maria Hawkins Whitney Cash. Angelica Medic Brooke Johnso Emily Westma

Comparing R print-outs from LM, GLM, LMM and GLMM

Class time required: Three forty minute class periods (an additional class period if Parts 6 and 7 are done).

The organoleptic control of a wine appellation in France

Multiple Imputation for Missing Data in KLoSA

JOB READY ASSESSMENT BLUEPRINT RETAIL COMMERCIAL BAKING - PILOT. Test Code: 4110 Version: 01

RESOLUTION OIV-OENO 576A-2017

Proposal Problem statement Justification and rationale BPGV INRB, I.P. MBG, CSIC

How yeast strain selection can influence wine characteristics and flavors in Marquette, Frontenac, Frontenac gris, and La Crescent

Genome-wide comparative analysis of NBS-encoding genes between Brassica species and Arabidopsis thaliana

ESTIMATING ANIMAL POPULATIONS ACTIVITY

See Policy CPT CODE section below for any prior authorization requirements

Managing Multiple Ontologies in Protégé

Natural history of Trichinella britovi in the neighboring Mediterranean islands of Corsica and Sardinia

Object-Oriented Analysis and Design, Part 2 by Alistair Cockburn, with C++ code by Chuck Allison

Previous analysis of Syrah

COMMISSION REGULATION (EU)

Characterization of a plant gene family expanded in glycine max

The Cruel Exploiter- Acacia confusa (Taiwan Acacia)

Dietary Diversity in Urban and Rural China: An Endogenous Variety Approach

Transcription:

Detecting Conserved Sequences Eukaryotic Comparative Genomics June 2018 GEP Alumni Workshop Charles Darwin Motoo Kimura Barak Cohen Evolution of Neutral DNA Evolution of Non-Neutral DNA A A T C T A A T T G C T G T G A T T C A G A G T A G C A G T G A A T A G T C T T T G A T G T T G T T G C A G G A G T A G T C G T A * * * * * * * * * * * * * * * * * * * * * * * * * A C T T A G T C C G A T G T G C G T A C C G A C C A T A A G G A T G A C C A C G T A T A C C A T G T G G T A T C C G A T C C A T A A G C A T A C T * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * Multi-Species Alignment ATGTGGCGCAGCCTGTGCCAGCTGGACGATCGA ATGTAGCCTAGCCAGTGCCAGCTGGACGATCGA GTACATCGATAGCTTAGAATGCTGGACGATCTC GTACGTCGATAGCATAGAATGCTGGACGATCTC * * * * *********** How to do Comparative Genomics 1. Choose species to analyze 2. Align sequences 3. Identify streches of highly conserved nucleotides 1

closely related species Choose species distantly related species Closely Related Species align well not many changes Distantly Related Species hard to align lots of changes Case Study: Coding vs.non-coding. ORF TAA CASE 1: Non-Coding Non-Coding DNA -regulatory functions -short (5-15 bp) -degenerate -variable spacing Coding DNA -codes for protein -triplet code -open reading frame (ORF) -tend to be long (50-500 bp) -highly constrained TAA Closely-related sequences are uninformative paradoxus TCTTCTGAGACAGCATCACTTCTTCTTNTTTTTTACATAACTTATTCTTCTATAATTTTC cerevisiae TCCTTTGAGACAGCATTCGCCCAGTATTTTTTTTATTCTACA-AACCTTCTATAATTT-C ** * *********** * * ******* ** * ************ * paradoxus AACGTATTTACATAGTTCTGTATCAGTTTAATCACCATAATATTGTTTTCCCTCAACTAA cerevisiae AAAGTATTTACATAATTCTGTATCAGTTTAATCACCATAATATCGTTTTCT-----TTGT ** *********** **************************** ****** * paradoxus TGAATGCAATTAGATTTTCTTATTGTTCCCTCGCGGCTTTTTTTTGTTTTATAATCTATT cerevisiae TTAGTGCAATTAATTTTTCCTATTGTTACTTCG-GGCCTTTTTCTGTTTTATGAGCTATT * * ******** ***** ******* * *** *** ***** ******** * ***** paradoxus TTTTCCGTCATTTCTTCCCCAGATTTCCAACTTCATCTCCAGATTGTGTCTATGTAATGC cerevisiae TTTTCCGTCATC-CTTCCCCAGATTTTCAGCTTCATCTCCAGATTGTGTCTACGTAATGC *********** ************* ** ********************** ******* paradoxus ATGCTATCATATTGAGAAAAGATAGAGAAACAACCCTCCTGAAAAATGAAGCTACTGTCT cerevisiae ACGCCATCATTTTAAGAGAGGACAGAGAAGCAAGCCTCCTGAAAGATGAAGCTACTGTCT * ** ***** ** *** * ** ****** *** ********** *************** 2

Distantly-related sequences do not align Noncoding (Promoter) cerevisiae ACTTACCAT-CAAC-CATAGATGGGTAAAC---GGTTAGTAACTAGGAACACGAT castelli AGA-GTCAAACTTTTCGT ATA--TATATATAATATGTCTGATTGCTGGTT---T * ** * * * * * * * * * Multiple sequence alignments reveal conserved elements cerevisiae TGAGACAGCAT-CACTTCTT-CTTNTTTTTTACATAACTTATTCTTCTATAATTTTCAAC mikatae TGAGACAGCATTCACTTCTTTCTTTTTTTTTACATATCTTATTCTTCTATAATTTTCAAC Bayanus TGAGACAGCATTCGCCCAGT--ATTTTTTTTAT-TCTACAAACCTTCTATAATTT-CAAA kudriadzevi TGAGACTGCACTCCC--------TCTTCCTTTC------------TCCATAACTT---AC ****** *** * * * ** ** ** **** ** * UAS1 UAS2 paradoxus GTATTTACATAGTTCTGTATCAGTTTAATCACCATAAT------ATTGTTTTCCCTCAAC kluyveri GTATTTACATAGTTCTGTATCAGTTTAATCACCATAAT------ATTGTTTTCCCTCAAC cerevisiae GTATTTACATAATTCTGTATCAGTTTAATCACCATAAT------ATCGTTTTCTTTGT-- bayanus TTATTTACATAGTTTTGTATCAGTTTAATCACCATAATCGTAACACCGTTTTACCTCACC ********** ** *********************** * ***** * paradoxus kluyveri cerevisiae bayanus paradoxus kluyveri cerevisiae bayanus paradoxus kluyveri cerevisiae bayanus TAATGAATGCAATTAGATTTTC-TTATTGTTCCC-TCGCGGCTTTTTTTTGTTTTATAAT TAATGAATGCAATTAGATTTTCCTTATTGTTCCCCTCGCGGCTTTTTTTTGTTTTATAAT ---TTAGTGCAATTAATTTTTC-CTATTGTTACT-TCG-GGCCTTTTTCTGTTTTATGAG TGATGCGGG--A---ATCCTTC-AGACCGTTCTC-TCGCGC------------------- * * * *** * *** *** * UES MIG1 MIG1 -CTATTTTTTCCGTCATTTCTTCCCC-AGATTTCCAACTTCAT-CTCCAGATTGTGTCTA ACTATTTTTTCCGTCATTTCTTCCCCCAGATTTCCAACTTCATACTCCAGATTGTGTCTA -CTATTTTTTCCGTCATC-CTTCCCC-AGATTTTCAGCTTCAT-CTCCAGATTGTGTCTA -CTTTTTTTTTCGTCATTTCTTCCCC-AGATCTACAACTTTAA-CTCCAGACGGTGTATA ** ****** ****** ******* **** * ** *** * ******* **** ** TGTAATGCATGCTATCATATTGAGAAAAGATAGAGAAACAACCCTCCTGAAAAATGAAGC TGTAATGCATGCTATCATATTGAGAAAAGATAGAGAAACAACCCTCCTGAAAAATGAAGC CGTAATGCACGCCATCATTTTAAGAGAGGACAGAGAAGCAAGCCTCCTGAAAGATGAAGC GGCAGTACAAGCAGTGCTTTTGGGAAGAGGCAAAGCTGCAGACCTCGAGAACAATGAAGC * * * ** ** * * ** ** * * ** ** **** *** ******* CASE 2: Coding CLN3 TAA 3

Closely-related sequences are uninformative Less distantly related species not informative either Distanly related species reveal functional protein domains Identification of Multi-Species Conserved Regions (MCS) Human Chimp Mouse Rat Dog cccattcttttccaagtgtctccg--cctgcagcgattaggttagaaagcatttctctct cccattcttttccaagtgtctccg--cctgcagcgattaggttagaaagcatttctctct ttcagtcgtttcccagtgtctctga-cattcagagactactttagtaagcattt-tctct tcagtccttccctggcatctccag-cactcaa-gactactttagtaagcattt-tctctg tcaatgactttcccagtctcttctactgggaagagattaggttgcaaatcatttttctct * * * * * * ** How can we decide if this region is conserved? Margulies et al (2003) Gen. Res. 13:2507-18 4

Its like flipping coins (really) Binomial-Based Method for Detecting Conserved Sequences Human: AATGG Mouse: AATCG Status: CCCDC p = probability that a site is the same between human and mouse by chance alone (Kimura), q = 1-p For an alignment N base pairs long with n identities calculate the cumulative binomial probability as: P ( X n) = N i= n p q i N i N i Margulies et al (2003) Gen. Res. 13:2507-18 Large sequencing projects are underway Tree Topology Influences Power Star Phylogeny Actual Phylogeny species A species F species B species E species C species D 5

Challenges in larger genomes PhastCons and the UCSC Browser Olig2 1) Deciding on the neutral rate of substitution 2) Local differences in neutral rate of substitutions 3) Multiple hypothesis testing 100 Kb upstream of Olig2 4) Repeat sequences and uneven base composition Motif Searching Across Several Multiple Alignments Information Content Species 1 Species 2 Species 3 Gene 1 Gene 2 Gene 3 Gene N EcoR1 Random GCCTAC ACATTC TCATTC CGACTC ATATCG GAAATG Rap1 TGTATGGGTG TGTTCGGATT TGCATGGGTG TGTACAGGTG TGTATGGATG TGTTCGGGTT TGTATGGGTG Weight Matrix Model of TATA Box Weight Matrix Model of TATA Box Score = -24 A: -8 10-1 2 1-8 C: -10-9 -3-2 -1-12 G: -7-9 -1-1 -4-9 T: 10-6 9 0-1 11.A C T A T A A T G T A: -8 10-1 2 1-8 C: -10-9 -3-2 -1-12 G: -7-9 -1-1 -4-9 T: 10-6 9 0-1 11 6

Weight Matrix Model of TATA Box Weight Matrix Model of TATA Box Score = 43 N(b,i).A C T A T A A T G T A: -8 10-1 2 1-8 F(b,i) C: -10-9 -3-2 -1-12 G: -7-9 -1-1 -4-9 T: 10-6 9 0-1 11 S(b,i) = log[f(b,i)/p(b)] Now we can compare motifs to each other Species 1 Species 2 Species 3 MAGMA unaligned motif finding in multispecies conserved regions Gene 1 Gene 2 Gene 3 Gene N A C G T 4-3 5-6 -2-5 2-1 -2 11-1 -1-10 8 2-4 2-3 -3 2 1 2-3 15 A C G T 3-2 2 1 3 1 3-1 -2 7-2 -1-8 6 3-2 2-2 -1 1 1 4-3 9 *Ihuegbu, Stormo, & Buhler, JCB 19:139, 2012 7