SNP discovery from amphidiploid species and transferability across the Brassicaceae Jacqueline Batley University of Queensland, Australia j.batley@uq.edu.au 1
Outline Objectives Brassicas Genome Sequencing SNP discovery SNP validation Cross species transferability Application Future work 2
Objectives Development of bioinformatics tool for SNP discovery and annotation Establish cost effective discovery and validation of SNPs within the amphidiploid B. napus Assess association of SNPs with genes for agronomic traits Assess the extent of LD within B. napus Assess genetic diversity of important agronomic genes within cultivated Brassica spp. and wild relatives Establish a strategy for SNP discovery from other large and complex genomes. 3
Methodology Paired end sequence from parents of mapping populations SNP discovery Genotyping using golden gate and infiunium assays SNPs genetically and physically mapped Cross species amplification to other Brassicaceae members 4
Brassicas
Diversity genomics Characterising genomic and phenotypic diversity in cultivated and wild plant species and their pathogens Brassicaceae, Leptosphaeria maculans Investigating genetic variation in crops and wild relatives Investigating the evolution of plant pathogen interactions Identifying novel genes and genetic markers for traits of interest, such as disease resistance 6
Genetic diversity Germplasm collections are valuable gene pools Assessing genetic and genomic diversity within these collections: assign lines and populations to diverse groups study the evolutionary history of wild relatives verify pedigrees and fill in the gaps in incomplete pedigree or selection history monitor changes in allele frequencies in cultivars or populations help narrow the search for new alleles at loci of interest. 7
Domestication bottlenecks B. napus canola, B.juncea mustard and B. carinata are allopolyploids. Rare natural polyploids only incorporate a limited genetic diversity from progenitor diploids. Wide genetic diversity in B. rapa, B. nigra, B. oleracea progenitors and wild relatives, options to enhance canola and mustard. A range of strategies is available to realise the genetic potential of the Brassicaceae. 8
Sequence data Illumina GAIIx and Hi-Seq data for: 8 B. napus cultivars 2 B. rapa cultivars B. oleracea 3 Brassicaceae Funding for 100+ Brassicas 9
Brassica genome sequencing B. rapa ssp. Pekinensis var. Chiifu 10 chromosomes, ~550 Mbp Multinational Brassica genome sequencing committee originally agreed BAC by BAC sequencing approach >100,000 BAC end sequences >600 BACs sequenced Genome sequenced using Illumina GAIIx 10
B. rapa SNP discovery and genotyping Illumina paired end sequence from parents of mapping populations SNP discovery Genotyping using golden gate Physical mapping Cross species amplification to other Brassicaceae members 11
SNP validation
Genotyping Illumina Golden gate system 384 SNPs 2 B. rapa mapping populations Parents of B. napus mapping populations Selection of wild Brassicaceae 13
SNP Validation SNP Pool 1 Strictest Criteria SNP Pool 2 Less strict Criteria SNP Pool 3 Lenient Criteria ~ 320 SNPs ~ 50 SNPs ~ 15 SNPs GoldenGate Oligo Pool
SNP Validation SNP Pool 1 Strictest filtering Criteria SNP Pool 2 Less strict Criteria SNP Pool 3 Lenient Criteria 94% conversion 80% conversion 30% conversion
SNP Genotyping 16
SNP Genotyping 17
Genetic diversity Assess relationships within the Brassicaceae Correlate this with morphological and interspecific hybridisation data 18
Brassicaceae diversity
Brassicaceae diversity
B. napus SNP discovery Custom algorithm developed for SNP discovery from Illumina data for amphidiploid species Distinguish between inter and intra genomic SNPs
The SGSautoSNP algorithm We do not consider the reference in SNP discovery the reference is only used to bring the reads together SNPs are called from these reads => different to most other SNP callers 1. coverage must be at least 4 2. SNP score must be at least 2 Example: SP1 = 6*A AP1 = 1*G M2P = 1*G SNP score = 2 3. no conflict within a variety i.e. all bases in each cultivar must be the same if e.g. Junior 3 * A and 1 * T => conflict 22
Output visualisation 23
B. napus SNP discovery Custom algorithm developed for SNP discovery from Illumina data for amphidiploid species Distinguish between inter and intra genomic SNPs XA_0011r 1252 1252 3 S=G=2;M1=G=3;Sr=X=0;A=G=3;J=T=3;M2=G=1;Bn=X=0;E=X=0; T;G; XA_0011r 1379 1379 5 S=T=2;M1=T=3;Sr=X=0;A=T=1;J=C=3;M2=X=0;Bn=X=0;E=C=2; C;T; XA_0011r 2036 2036 4 S=G=1;M1=G=2;Sr=X=0;A=G=1;J=T=8;M2=T=3;Bn=X=0;E=T=6; T;G; XA_0011r 4921 4921 2 S=X=0;M1=X=0;Sr=X=0;A=T=8;J=X=0;M2=X=0;Bn=X=0;E=C=2; C;T; XA_0011r 5070 5070 4 S=X=0;M1=G=2;Sr=X=0;A=G=2;J=A=6;M2=X=0;Bn=X=0;E=X=0; A;G; XA_0011r 5273 5273 3 S=C=4;M1=C=5;Sr=X=0;A=C=6;J=G=2;M2=X=0;Bn=X=0;E=G=1; C;G; XA_0011r 5442 5442 8 S=T=1;M1=X=0;Sr=X=0;A=T=7;J=C=5;M2=X=0;Bn=C=1;E=C=3; C;T; XA_0011r 5512 5512 7 S=G=3;M1=G=3;Sr=X=0;A=G=5;J=A=4;M2=X=0;Bn=A=2;E=A=1; A;G; XA_0011r 5976 5976 11 S=T=8;M1=T=1;Sr=X=0;A=T=2;J=C=6;M2=X=0;Bn=C=2;E=C=3; C;T; XA_0011r 5992 5992 10 S=A=9;M1=A=1;Sr=X=0;A=A=3;J=G=5;M2=X=0;Bn=G=2;E=G=3; A;G;
B. napus SNP discovery Base Change Type Number A>G C>T A>C A>T C>G G>T transition transition transversion transversion transversion transversion 105045 105513 42480 49287 29828 42217
B. napus SNP discovery Base Change Type Number A>G transition 105045 C>T transition 105513 A>C transversion 42480 A>T transversion 49287 C>G G>T transversion transversion 29828 42217 Base Change Type Number A>G transition 24207 C>T A>C A>T C>G G>T transition transversion transversion transversion transversion 24375 10158 12254 6621 9918
B. napus SNP density 30 25 20 15 Series1 10 5 0 0 100000 200000 300000 400000 500000 600000 700000
B. napus SNP validation 24/25 SNPs correctly predicted through validation by PCR and sequencing 20/22 SNPs correctly predicted through Golden gate Range of sequence coverage and confidence scores 28
Gene discovery Finding the genes for the traits Integration of genetic data with genomic data Mapping of QTL regions to genomic data... Annotation 29
Gene discovery - application OI09 A06 Genetic map 10 cm Physical map Na12 E11 BRAS023BRMS040 CB10439 CB10278 BRMS036 BRMS075 Na12 A02 BRMS005 KBRH143H15 RA2 A05 Physical 1Mbp scaffolds 30
Scaffold and Marker Assembly Chromosome Marker Scaffold A7
CMap3D Duran et al. (2010) Bioinformatics 26: 273-274 32
Identification of Candidate Blackleg Resistance Genes TNL (Gene number) Scaffold 1 3 2 3 3 3 4 3 5 3 6 3 7 12 8 12 9 12 10 12 11 3 12 3 13 3 14 3 15 12 16 12 17 19 18 19 19 19 20 19 21 19
TNL6 Sequence and Protein Alignment B. rapa B. napus 1 B. napus 2 B. rapa B. napus 1 B. napus 2 B. rapa B. napus 1 B. napus 2
Gene Mutation Species Predicted Number of Reads Sequence Verified TNL 1 18,240 Reference: B. rapa G N/A B. napus 1 G 3 G B. napus 2 C 1 C TNL5 5,208,963 Reference: B. rapa C N/A B. napus 1 C 1 C B. napus 2 T 4 T TNL 5 5,209,056 Reference: B. rapa A N/A B. napus 1 G 1 G B. napus 2 A 5 A TNL5 5,209,772 Reference: B. rapa A N/A B. napus 1 A 1 A B. napus 2 T 6 T TNL5 5,207,023 Reference: B. rapa G N/A B. napus 1 T 4 T B. napus 2 G 1 G TNL 6 5,891,882 Reference: B. rapa T N/A B. napus 1 T 4 T B. napus 2 C 3 C
Change in charge was the most common change due to protein differences
37
Gene discovery Primer PCR Gene/EST genomic sequence Known (Arabidopsis) Unknown (Brassica)
http://flora.acpfg.com.au/tagdb/ http://flora.acpfg.com.au/tagdb Marshall, D.J., et al. (2010) Plant Methods. 6:19 39
TAGdb output 40
Sym genes Brassicas can not form symbiotic associations with rhizobia or mycorrhizae - BUT - contain homologues for many genes involved in these processes. What is the diversity of and selection pressure on these genes across the Brassicaceae? What are these proteins doing? general pathogen/microbial perception and response? e.g. LjNUP85, LjNUP133 Tagdb results e.g. NFR1 NFR5 9 Arabidopsis homologues e.g. LjPOLLUX Ferguson et al., 2010
Sequencing SYM genes in Brassicas: NSP1 (Nodulation Signalling Pathway1) BrNSP1 and BoNSP1 vs MtNSP1 = 57% CDS similarity AtNSP1 vs MtNSP1 = 58% CDS similarity BrNSP1 vs AtNSP1 = 83.8% CDS similarity BoNSP1 vs AtNSP1 = 83.7% CDS similarity BrNSP1 vs BoNSP1 = 98% CDS similarity Ferguson et al., 2010
Sequencing SYM genes in Brassicas: NSP1 (Nodulation Signalling Pathway1) High conservation in the GRAS domain. Residues important for NSP1 function in Lotus japonicus are conserved in the Brassicaceae. Ferguson et al., 2010
Sequencing SYM genes in Brassicas: NSP2 (Nodulation Signalling Pathway2) BrNSP2 vs BoNSP2 = 98% CDS similarity BrNSP2 vs AtNSP2 = 78.2% CDS similarity BoNSP2 vs AtNSP2 = 78.5% CDS similarity BrNSP2 and BoNSP2 vs MtNSP2 = 55% CDS similarity Ferguson et al., 2010
Sequencing SYM genes in Brassicas: NSP1 (Nodulation Signalling Pathway1) Alanine residue important for NSP1-NSP2 interaction in Lotus japonicus is not conserved in the Brassicaceae, but conserved in rice. Rice NSP1 and NSP2 are functional in nodulation in transgenic Lotus japonicus. Ferguson et al., 2010
Sequencing SYM genes in Brassicas: POLLUX One copy on both the A and the C genomes: BrPOLLUX (A), BoPOLLUX (C) 98% similar. Ferguson et al., 2010
BrPOLLUX CDS is 69.4% similar to Lj POLLUX CDS, Bo POLLUX = 69%, AtPOLLUX = 61%. 85.6% similarity between BrPOLLUX and AtPOLLUX, 85.4% between Bo and At. Currently sequencing POLLUX in other Brassicaceae members. Least similarity in N-terminal transit peptide.
Consistent with cation channel function: POLLUX Geneious Pro Transmembrane Prediction (Biomatters).
Future work SNP identification and genotyping of cultivated and wild Brassicaceae Large scale SNP discovery and genotyping for fine mapping and LD studies Identify which Brassicaceae to sequence Use next generation sequencing data, molecular markers and morphological variation to study diversity across Brassica species and wild relatives 49
Summary Next generation sequencing data is suitable for gene, promoter and SNP discovery in nonsequenced and orphan species SNPs can be applied for gene discovery and evolution in crop species and wild relatives High throughput genotyping can be used for fine mapping and LD studies 50
Acknowledgements Emma Campbell Christina Delay Megan McKenzie Reece Tolleneare Joanne McLanders Manuel Zander Alice Hayward Paul Berkman Chris Duran Kaitao Lai Michal Lorenc Sahana Manoli Adam Skarshewski Lars Smits Jiri Stiller David Edwards Bob Redden Harsh Raman Xiaowu Wang