Schatzlab Research Projects Michael Schatz. Oct 16, 2013 Research Topics in Biology, WSBS

Similar documents
De novo genome assembly

De novo genome assembly

Identification of haplotypes controlling seedless by genome resequencing of grape

Reasons for the study

Mapping and Detection of Downy Mildew and Botrytis bunch rot Resistance Loci in Norton-based Population

Catalogue of published works on. Maize Lethal Necrosis (MLN) Disease

Confectionary sunflower A new breeding program. Sun Yue (Jenny)

Analyzing Human Impacts on Population Dynamics Outdoor Lab Activity Biology

YEAST Wrangling The Many Flavors of Brewing Yeast CURT WITTENBERG FOR SOCIETY OF BARLEY ENGINEERS OCTOBER 4, 2017

University of Groningen. In principio erat Lactococcus lactis Coelho Pinto, Joao Paulo

Genomics: cracking the mysteries of walnuts

Eukaryotic Comparative Genomics

Construction of a Wine Yeast Genome Deletion Library (WYGDL)

is pleased to introduce the 2017 Scholarship Recipients

of Vitis vinifera using

Where in the Genome is the Flax b1 Locus?

SoyBase, the USDA-ARS Soybean Genetics and Genomics Database

MUMmer 2.0. Original implementation required large amounts of memory

Eukaryotic Comparative Genomics

A Computational analysis on Lectin and Histone H1 protein of different pulse species as well as comparative study with rice for balanced diet

SNP discovery from amphidiploid species and transferability across the Brassicaceae

Pevzner P., Tesler G. PNAS 2003;100: Copyright 2003, The National Academy of Sciences

GENETICS AND EVOLUTION OF CORN. This activity previews basic concepts of inheritance and how species change over time.

Molecular Clocks. Deamidation. The Protein Aldolase. olecular Clocks - Investigating the amidation of Asparaginyl and Glutaminyl

COOPER COMPARISONS Next Phase of Study: Results with Wine

Organization, diversity, expression and evolutionary dynamics of the NB resistance gene family in grapevine and related species

Green Beans, the Wonderful Fruit Using Scientific Measurement

Activity 10. Coffee Break. Introduction. Equipment Required. Collecting the Data

BMAP4 ( Brassicaceae

Mapping the distinctive aroma of "wild strawberry" using a Fragariavesca NIL collection. María Urrutia JL Rambla, Antonio Granell

Randy Nelson Ram Singh

Crystal Sweetman 1, Darren CJ Wong 1, Christopher M Ford 1 and Damian P Drew 1,2*

Mini Project 3: Fermentation, Due Monday, October 29. For this Mini Project, please make sure you hand in the following, and only the following:

Preliminary observation on a spontaneous tricotyledonous mutant in sunflower

Need: Scantron 882-E (big one) and note paper for short answer questions. Topics: End of chapter 8, chapter 9, chapters 10, a little of chapter 11

Starbucks Geography Summary

WP Board 1054/08 Rev. 1

Learning Connectivity Networks from High-Dimensional Point Processes

Jure Leskovec, Computer Science Dept., Stanford

Level 3 Biology, 2016

Title: Genetic Variation of Crabapples ( Malus spp.) found on Governors Island and NYC Area

Innovations and Developments in Yeast. Karen Fortmann, Ph.D. Senior Research Scientist

Supplemental Data. Jeong et al. (2012). Plant Cell /tpc

1. Continuing the development and validation of mobile sensors. 3. Identifying and establishing variable rate management field trials

Research Background: Weedy radish is considered one of the world s

UNIVERSITY OF CAMBRIDGE INTERNATIONAL EXAMINATIONS International General Certificate of Secondary Education

Reshaping of crossover distribution in Vitis vinifera x Muscadinia rotundifolia interspecific hybrids

Virginia Wine Board Project # Annual Progress Report - July 2015

Class time required: Three forty minute class periods (an additional class period if Parts 6 and 7 are done).

John Perry. Fall 2009

Math Released Item Grade 5. Bean Soup M01289

Why PAM Works. An In-Depth Look at Scoring Matrices and Algorithms. Michael Darling Nazareth College. The Origin: Sequence Alignment

Fernando Pistón *, Javier Gil-Humanes and Francisco Barro

Migratory Soaring Birds Project. SEA & Wind Energy planning

6.2.2 Coffee machine example in Uppaal

DNA FOR DINNER? Appetizer LESSON 1 WHAT WE WILL LEARN WHAT WE WILL DO TABLE OF CONTENTS

Hamburger Pork Chop Deli Ham Chicken Wing $6.46 $4.95 $4.03 $3.50 $1.83 $1.93 $1.71 $2.78

Mid-Atlantic Regional Seed Bank N A T I V E A S H S E E D C O L L E C T I O N P R O T O C O L

Pasta Market in Italy to Market Size, Development, and Forecasts

Reniform Resistance from Texas Day Neutral Lines

CARIBBEAN FOOD CROPS SOCIETY

Non-Allergenic Egg Substitutes in Muffins

There are 100 billion stars in the Andromeda galaxy. and 100 billion galaxies in the known universe.

Cambridge International Examinations Cambridge International General Certificate of Secondary Education

IMPORTATION OF NELUMBO NUCIFERA

Institute of Brewing and Distilling

Structures of Life. Investigation 1: Origin of Seeds. Big Question: 3 rd Science Notebook. Name:

Greenhouse Effect Investigating Global Warming

The french-italian sequencing project of the grapevine genome

HOW LONG UNTIL TRULY GLUTEN-FREE?

ARM4 Advances: Genetic Algorithm Improvements. Ed Downs & Gianluca Paganoni

Citrus Black Spot Update

Association Rule Mining

Genome-wide identification and characterization of mirnas responsive to Verticillium longisporum infection in Brassica napus by deep sequencing

MyPlate. National FCS Standard: Apply various dietary guidelines in planning to meet nutrition and wellness needs.

Temple Frieze from Iraq 2500 BCE. Outline. Evolution of Lactase Persistence. Domesticated Cattle. Prehistory of dairying

Experiment # Lemna minor (Duckweed) Population Growth

Cambridge International Examinations Cambridge International General Certificate of Secondary Education

Cambridge International Examinations Cambridge International General Certificate of Secondary Education

June 29, Tomato Genetics and Breeding at Penn State. An Overview. Majid R. Foolad

SHORT TERM SCIENTIFIC MISSIONS (STSMs)

JUNPERUS VIRGINIANA IN THE SERRANIAS DEL BURRO MOUNTAINS, COAHUILA, MEXICO: A PLEISTOCENE RELICT

Emerging Foodborne Pathogens with Potential Significance to the Middle East

Analysis of Genetic Variation and Diversity in Nelumbo Nucifera by RAPD and NIRS

Argument Paper, MLA Style (Zhang)

GMO Fruit Crops. Richard Heerema Extension Pecan & Pistachio Specialist

Evolutionary Microbiology. Chapter 12. Human Apex of All Life?

Measure the specific heat of lead. Identify an unknown metal from its specific heat (optional),

Building Reliable Activity Models Using Hierarchical Shrinkage and Mined Ontology

Innate potatoes Driving Change with Technology

Algorithms. How data is processed. Popescu

Frontiers in Food Allergy and Allergen Risk Assessment and Management. 19 April 2018, Madrid

Ch 11 Modern Homo sapiens

Classification Lab (Jelli bellicus) Lab; SB3 b,c

Incorporating MyPlate in the Child Care Classroom. Presented by: Christanne Harrison, MPH, RD National Food Service Management Institute

The Wild Bean Population: Estimating Population Size Using the Mark and Recapture Method

Technology: What is in the Sorghum Pipeline

Seaweed cultivation in Northeast Atlantic: what we learned at NACE

The Pleistocene Epoch 1

MyPlate ipad Webquest

Transcription:

Schatzlab Research Projects Michael Schatz Oct 16, 2013 Research Topics in Biology, WSBS

A Little About Me Born RFA CMU TIGR UMD CSHL

Schatz Lab Overview Human Genetics Computation Sequencing Modeling Plant Genomics

Milestones in Molecular Biology There is tremendous interest to sequence: What is your genome sequence? How does your genome compare to my genome? Where are the genes and how active are they? How does gene activity change during development? How does splicing change during development? How does methylation change during development? How does chromatin change during development? How does is your genome folded in the cell? Where do proteins bind and regulate genes? What virus and microbes are living inside you? How has the disease mutated your genome? What drugs should we give you?

What is your genome? Genome of the long-living sacred lotus (Nelumbo nucifera Gaertn.) Ming, R et al. (2013) Genome Biology 14:R41

Shredded Book Reconstruction Dickens accidentally shreds the first printing of A Tale of Two Cities Text printed on 5 long spools It was the It was best the of best times, of times, it was it was the worst the worst of of times, it it was the the age age of of wisdom, it it was the age the of age of foolishness, It was the It was best the best of times, of times, it was it was the the worst of times, it was the the age age of wisdom, of wisdom, it was it the was age the of foolishness, age of foolishness, It was the It was best the best of times, of times, it was it was the the worst worst of times, of times, it it was the age of wisdom, it was it was the the age age of of foolishness, It was It the was best the of best times, of times, it was it was the the worst worst of times, of times, it was the age of wisdom, it was it was the the age age of foolishness, of foolishness, It was It the was best the best of times, of times, it was it was the the worst worst of of times, it was the age of of wisdom, it was it was the the age of age of foolishness, How can he reconstruct the text? 5 copies x 138, 656 words / 5 words per fragment = 138k fragments The short fragments from every copy are mixed together Some fragments are identical

It was the best of age of wisdom, it was Greedy Reconstruction best of times, it was it was the age of it was the age of it was the worst of of times, it was the of times, it was the of wisdom, it was the the age of wisdom, it the best of times, it the worst of times, it It was the best of was the best of times, the best of times, it best of times, it was of times, it was the of times, it was the times, it was the worst times, it was the age times, it was the age times, it was the worst was the age of wisdom, was the age of foolishness, The repeated sequence make the correct reconstruction ambiguous It was the best of times, it was the [worst/age] was the best of times, was the worst of times, wisdom, it was the age Model sequence reconstruction as a graph problem. worst of times, it was

de Bruijn Graph Construction D k = (V,E) V = All length-k subfragments (k < l) E = Directed edges between consecutive subfragments Nodes overlap by k-1 words Original Fragment It was the best of Directed Edge It was the best was the best of Locally constructed graph reveals the global sequence structure Overlaps between sequences implicitly computed de Bruijn, 1946 Idury and Waterman, 1995 Pevzner, Tang, Waterman, 2001

It was the best de Bruijn Graph Assembly was the best of the best of times, best of times, it of times, it was times, it was the it was the worst was the worst of the worst of times, worst of times, it After graph construction, try to simplify the graph as much as possible it was the age was the age of the age of foolishness the age of wisdom, age of wisdom, it of wisdom, it was wisdom, it was the

de Bruijn Graph Assembly It was the best of times, it it was the worst of times, it of times, it was the the age of foolishness After graph construction, try to simplify the graph as much as possible it was the age of the age of wisdom, it was the

The full tale it was the best of times it was the worst of times it was the age of wisdom it was the age of foolishness it was the epoch of belief it was the epoch of incredulity it was the season of light it was the season of darkness it was the spring of hope it was the winder of despair age of wisdom best of times foolishness worst it was the winter of despair spring of hope season of light darkness epoch of belief incredulity

N50 size Def: 50% of the genome is in contigs as large as the N50 value Example: 1 Mbp genome 50% 1000 300 100 45 45 30 20 15 15 10..... N50 size = 30 kbp (300k+100k+45k+45k+30k = 520k >= 500kbp) Note: N50 values are only meaningful to compare when base genome size is the same in all cases

Research Dimensions 1. New Biotechnology Sequencing: Pacific Biosciences, Moleculo, Oxford Nanopore Mapping: BioNanoGenomics, OpGen Faster/Cheaper/Better assemblies 2. Algorithmics Algorithms for assembling extremely large genomes Improved error correction, scaffolding, haplotype phasing Analyzing populations of genomes 3. Annotation & Comparative Genomics Identifying functional elements Cross species comparisons, models of evolution Identifying mutations responsible for disease and other traits

Acknowledgements Schatz Lab Giuseppe Narzisi Shoshana Marcus James Gurtowski Srividya Ramakrishnan Hayan Lee Rob Aboukhalil Mitch Bekritsky Charles Underwood Tyler Gavin Alejandro Wences Greg Vurture Eric Biggers Aspyn Palatnick CSHL Hannon Lab Gingeras Lab Iossifov Lab Levy Lab Lippman Lab Lyon Lab Martienssen Lab McCombie Lab Ware Lab Wigler Lab IT Department NBACC Adam Phillippy Sergey Koren

Thank You! http://schatzlab.cshl.edu @mike_schatz