Vignette to Package impute.r

Size: px

Start display at page:

Download "Vignette to Package impute.r"

Noel Lawrence
6 years ago
Views:

1 Vignette to Package impute.r Yvonne M. Badke Department of Animal Science Michigan State University East Lansing, Mi, USA badkeyvo@msu.edu Juan P. Steibel Departments of Animal Science, Fisheries and Wildlife Michigan State University East Lansing, Mi, USA October 25, 2012 Version Introduction impute.r is an R [7] package developed to reproduce imputation accuracy calculations presented in Badke et al. [1]. The package is build as an extension to the R package synbreed [8]. We expanded the functionality of synbreed to include genotype imputation and phasing using a reference panel of haplotypes, and subsets of tagsnp to impute all non-typed SNP. impute.r includes functions necessary to obtain and utilize the input/output of the BEAGLE software [2, 3], as well as performing the imputation using various options of the BEAGLE phasing algorithm. In addition, impute.r contains three functions that are able to generate input for the FESTA program [6] from a gpdata (package synbreed) object containing phased haplotypes. Functions for FESTA program formatting are detailed in a separate vignette: Conversion of data into FESTA format. In addition, the impute.r package includes functions to compute the accuracy measures reported in Badke et al. [1]. This guide provides users with step-by-step instructions to obtain the graphical output included in the publication [1] Input formats, recoding, and quality editing 2.1 Input formats Genotypes used in Badke et al. [1] were obtained from DNA samples of 889 Yorkshire sires and 96 animals in sire/dam/offspring trios, genotyped for all SNP (M=62,163) on the Illumina PorcineSNP60 Genotyping 1

2 BeadChip (Illumina Inc.) at a commercial laboratory (GeneSeek, a Neogen Company, Lincoln, NE). These genotypes were available in a long table format with one row per SNP/sample combination and two columns containing the observed alleles in character coding (A/G/T/C). We reformatted the original long-format data into a gpdata object (york_gpdata) that is provided as part of the impute.r package. The following data objects are released as part of the impute.r package: york_gpdata This file is a gpdata object containing 889 unrelated Yorkshire sires and 96 animals in sire/dam/offspring trios genotyped on SNP. Besides the raw genotypes in a data-frame with SNP in columns and individuals in rows (geno), the object contains a map, a phenotype file, and a pedigree acc_table Is a data-frame containing accuracies for three methods of tagsnp selection that will be used in section ref_size Is a data-frame containing SNP-wise imputation accuracies for all SNP on SSC14 for in- creasing reference panel size, as well as the minor allele frequency of each SNP. The data-frame will be used in section 4 to illustrate how to derive two Figures from Badke et al. [1] To install the package impute.r it is important that the packages epicalc, coda, and synbreed are preinstalled. This can be achieved using the following code: install.packages("epicalc") install.packages("synbreed") install.packages("coda") The york_gpdata object contains the following data-frames (for further detail on how to create a gpdata object please refer to [8]): geno is a data frame with samples organized in rows (identified by row-names) and SNP organized in columns (identified by column-names). The genotype entries can be either in numerical format, as counts of the minor allele, or in character formatting pheno is a data frame with samples organized in rows and traits organized in columns. To use the impute function it is necessary that pheno contains at least one column (sample) with entries trio or random to identify whether the sample is part of a trio and that it should be phased as such, or if the sample is not part of a trio and it should be phased as unrelated to all other samples map is a data frame with one row for each marker and two columns (named chr and pos). The first column identifies the chromosome (numeric or character but not factor) and second column the 2

3 position on the chromosome in centi Morgan or the physical distance relative to the reference sequence in base-pairs. Unique row-names indicate the marker names which should match with marker names in geno pedigree is an object of the class pedigree in synbreed, that can be obtained using the function create.pedigree. create.pedigree requires a vector of sample IDs, a vector identifying the first 57 parent, a vector identify the second parent, and a vector containing the sex of each animal. The user can further specify, if create.pedigree should infer the generation of each animal, or provide a vector identifying the generation of each sample, and if create.pedigree should add ancestors to the pedigree that did not occur in the sample vector Quality editing Animals for this study have been previously cleaned such that all given input only contains data for animals with genotypes available for more than 90% of the SNP, leaving 889 sires and 96 trio animals. To obtain a york_gpdata object was created using create.gpdata from synbreed [8]. Please refer to the manual and vignette of synbreed for more detail on how to use this function. The provided york_gpdata contains uncleaned genotypes, with only those SNP removed that have not been called in any study sample. further process the data we will use the codegeno to clean the data and recode it into numeric format: library(impute.r) library(epicalc) library(synbreed) library(coda) data(york_gpdata) which.heter<-function(x){substr(x,1,1)!=substr(x,3,3)} # applying codegeno for recoding / quality editing york_cleaned<-codegeno(york_gpdata,impute=false, replace.value=null, maf=0.05, nmiss=0.10, label.heter=which.heter, keep.identical=true, verbose=true, print.report=false) To The gpdata object input_cl now contains only SNP with genotypes available for more than 90% of samples, SNP with minor allele frequency (MAF) larger than 5% and alleles have been recoded into numeric counts of the B allele (0,1,2). synbreed identifies the B allele based on MAF such that if data is assembled from several sources it is not advisable to create different gpdata objects due to the fact that small differences in the MAF between the data sources provided could lead to opposite recoding. Using this input we can use the function impute, as explained below, to perform all phasing and genotype imputation that is necessary to reproduce results presented in Badke et al. [1]. 3

4 86 3 Haplotype phasing and genotype imputation using impute Four different phasing and genotype imputation scenarios were used in Badke et al. [1]. In this section we detail how the gpdata object developed in section 2 can be used with the function impute to obtain the desired output for all four scenarios. First we introduce the function impute and all necessary arguments that need to be provided and second we show examples for all four scenarios The impute function The impute function is structurally similar to the codegeno function provided by synbreed [8]. However, codegeno does not implement estimation of phase using sire/dam/offspring trios using the BEAGLE trio input option, and it does not facilitate the use of a reference panel of haplotypes. As a result codegeno can impute randomly missing genotypes for a set of samples/snp (subsets can be obtained using discard.markers or discard.individuals), but it does not support the imputation of high density genotypes from a set of tagsnp. While using much of the original structure of codegeno we added these options to impute. Usage of impute and specification of necessary arguments: impute(gpdata, all_animals=true, animals=c(), all_snp=true, snp=c(), beagle_method=c("trio", "unrelated", "pairs"), reference=false, ref_panel=null, showbeagleoutput=false, nsamples=4, niterations=10, mem=6000) gpdata is a gpdata object as detailed above containing the data frames geno, pheno, map, and pedigree. All these data frames are assembled as specified by synbreed with one column identifying the sample as either random or trio in the pheno data frame all_animals logical, should all samples in geno be imputed, default is TRUE animals a vector containing the IDs of animals (as found in row names of pheno and geno) that should be imputed, if all_animals=false. If all_animals=true any input to this vector will be ignored all_snp logical, should all SNP in geno be imputed, default is TRUE snp 4

5 a vector containing the IDs of SNP that should be used for imputation if all_snp=false. If all_snp=true any input to this vector will be ignored. of geno) beagle_method a character string indicating the beagle method that should be used. impute takes inputs "trio", indicating that the trio procedure in BEAGLE should be used, "unrelated" indicating that no pedigree relation between animals should be assumed for imputation, and "pairs" indicating that the BEAGLE procedure to impute parent-offspring data should be used reference logical, should a reference panel be used to impute SNP/samples in geno, default is FALSE ref_panel 127 if reference=true this is a data frame of reference haplotypes. SNP will be in the columns and identified by column names and haplotypes will be in the rows. This data frame is expected to contain characters A and B to identify the alleles. Row names can be used to identify the individual the haplotype is sampled from, but they are not required showbeagleoutput logical, should the BEAGLE output during the imputation be printed on the screed, default is FALSE nsamples numeric, identifies the number of haplotype pairs to sample for each individual during each iteration of the BEAGLE phasing algorithm. The default is nsamples=4 as specified in [2] niterations positive even integer giving the number of iterations of the phasing algorithm. If an odd integer is specified, the next even integer is used. The default is niterations=10 as specified in [2] mem numeric, is the number of Megabytes of memory available. The default is mem=6000 allowing BEAGLE to use a maximum of 6GB of RAM Phasing of a reference panel of haplotypes from a trio design BEAGLE [3] has a special option allowing the user to provide genotypes from sire/dam/offspring trios for phasing. The resulting sire/dam haplotypes are suitable as a reference panel of haplotypes for imputation based on low density SNP panels. The provided family file can be used to identify those animals that are 5

6 part of a sire/dam/offspring trio and provide a vector containing the IDs of these animals as input for the animal argument of the impute function: trio<-rownames(york_cleaned$geno) [as.data.frame(york_cleaned$pheno)$sample=="trio"] scen1<-impute(york_cleaned, all_animals=false, animals=trio, all_snp=true, beagle_method="trio", reference=false, showbeagleoutput=true) The output of this application of the impute function is a list containing: 1) scen1$gpimputed, a gpdata object including imputed allelic dosages of all SNP/sample combinations in the geno data frame and, 2) scen1$ref, a data frame with SNP in the columns and haplotypes of the sires and dams in the input data in the rows. The data frame of haplotypes has two rows per sample. The second object returned by this function (scen1$ref) can be used as reference panel for future imputations as ref_panel=scen1$ref Imputation of randomly missing genotypes and phasing of unrelated individuals When there is no previous reference panel and samples are not presented in trios BEAGLE, can still estimate phase and impute missing data [3]. The following code applies the impute function to such a case for all animals labeled as randomly sampled from the sire population: sires<-rownames(york_cleaned$geno)[as.data.frame(york_cleaned$pheno)$sample=="random"] scen2<-impute(york_cleaned, all_animals=false, animals=sires, all_snp=true, beagle_method="unrelated", reference=false, showbeagleoutput=true) The output of this imputation is identical to the output described above, only that these haplotypes were obtained from unrelated individuals Imputation of randomly missing genotypes and phasing of unrelated individuals using a reference panel of haplotypes The result of this application of impute is similar to the one in section 3.3, only that in this case the haplotypes from the first phasing run are used as reference panel for imputation: sires<-rownames(york_cleaned$geno)[as.data.frame(york_cleaned$pheno)$sample=="random"] scen3<-impute(york_cleaned, all_animals=false, animals=sires, all_snp=true, beagle_method="unrelated", reference=true, ref_panel=scen1$ref, showbeagleoutput=true) 6

7 Imputation of unrelated individuals genotyped for a subset of SNP (tagsnp) using a reference panel of high density haplotypes In this case impute uses a list of tagsnp in a dataset and a reference panel of high density haplotypes derived from high density genotypes (scen1). data(tagsnp) scen4<-impute(york_cleaned, all_animals=false, animals=paste(sires), all_snp=false, snp=tagsnp, beagle_method="unrelated", reference=true, ref_panel=scen1[[2]], showbeagleoutput=true) The resulting gpdata object contains the data frame geno with the imputed allelic dosage, that can be used to estimate accuracy of imputation through comparison with the input data Estimation of accuracy of imputed genotypes Accuracy of imputation can be measured as 1) the proportion of correctly imputed alleles, 2) the correlation between observed and imputed allelic dosage, or 3) the proportion of correctly imputed alleles adjusted for MAF. The proportion of correctly imputed alleles can be obtained by either counting the difference between the observed allelic dosage and the inferred allelic dosage or by counting the difference between the observed allelic dosage and the posterior expectation of the allelic dosage: IA = 1 M N i i=1 j=1 g ij ĝ ij (1) 2 M N i i= where g ij is the observed allelic dosage of the i th SNP in the j th individual, ĝ ij is the corresponding posterior expected/inferred allelic dosage obtained from BEAGLE output, M is the total number of imputed SNP, and N i is the number of individuals with called genotypes for the i th SNP. However, recent research has pointed out, that quantifying imputation accuracy as the proportion of correctly imputed alleles is biased by the MAF of the imputed SNP [4, 5]. To obtain a measure of imputation accuracy that is unbiased by MAF we used the correlation between observed and imputed allelic dosage [1]. To estimate imputation accuracy in the following examples we used original cleaned gpdata object input_cl (section 2.2) to run the following example on SSC18 using a previously devised list of tagsnp: # discard markers in gpdata that are not on chr 18 york_gpdata<-discard.markers(york_cleaned, which=rownames(york_cleaned$map)[!york_cleaned$map$chr=="18"]) idx<-tagsnp%in%colnames(york_gpdata$geno) tagsnp<-tagsnp[idx] # making a reference panel of trios 7

8 trios<-impute(york_gpdata, all_animals=false, animals=trio, all_snp=true, beagle_method="trio", reference=false, showbeagleoutput=true) # imputing from the tagsnp for all sires imp<-impute(york_gpdata, all_animals=false, animals=sires, all_snp=false, snp=tagsnp, beagle_method="unrelated", reference=true, ref_panel=trios$ref, showbeagleoutput=true) # discarding the observed trio individuals prior to estimating accuracy obs_sires<-discard.individuals(york_gpdata, which=trio) # applying the accuracy estimation function using the observed genotypes and the imputed genotypes acc_out<-accuracy_summary(gpobserved=obs_sires, gpimputed=imp$gpimputed, tagsnp=tagsnp, HPD=0.95) 218 The function accuracy_summary returns average accuracy, SNP-specific accuracy, and individual-specific 219 accuracy, as well as several summary measures of imputation accuracy. The first object returned by accuracy_summary is summary_acc_ia, which is a data-frame of summary measures of imputation accuracy estimated as the proportion of correctly imputed alleles. acc_out$summary_acc_ia # total Sample SNP #Min # # # #Max #mean #HPD-lowerbound #HPD-upperbound The second object returned by accuracy_summary is summary_acc_r2, which is a data-frame of summary measures of imputation accuracy estimated as the correlation between observed and imputed allelic dosage. dim(acc_out$summary_acc_r2) # [1] 8 2 acc_out$summary_acc_r2 Sample SNP #Min # # # #Max #mean #HPD-lowerbound #HPD-upperbound The third object returned by accuracy_summary is individual_acc, which is a data-frame of individual imputation accuracy measured as both the proportion of correctly imputed alleles and the correlation between observed and imputed allelic dosage. dim(acc_out$individual_acc) # [1]

9 head(acc_out$individual_acc) # SampleID IA R2 #York_Sid_1 York_Sid_ #York_Sid_10 York_Sid_ #York_Sid_100 York_Sid_ #York_Sid_101 York_Sid_ #York_Sid_102 York_Sid_ #York_Sid_103 York_Sid_ summary(acc_out$individual_acc[,2:3]) # IA R2 # Min. : Min. : # 1st Qu.: st Qu.: # Median : Median : # Mean : Mean : # 3rd Qu.: rd Qu.: # Max. : Max. : The fourth object returned by accuracy_summary is snp_acc, which is a data-frame of SNP imputation accuracy measured as both the proportion of correctly imputed alleles and the correlation between observed and imputed allelic dosage. dim(acc_out$snp_acc) # [1] head(acc_out$snp_acc) # SNP IA R2 #MARC MARC #H3GA H3GA #MARC MARC summary(acc_out$snp_acc[,2:3]) # IA R2 # Min. : Min. : # 1st Qu.: st Qu.: # Median : Median : # Mean : Mean : # 3rd Qu.: rd Qu.: # Max. : Max. : The fifth object returned by accuracy_summary is snp_measures, which is a data-frame of SNP MAF estimated from the observed allele frequencies and the scaled chromosomal location of each SNP. dim(acc_out$snp_measures) # [1] head(acc_out$snp_measures) # SNP MAF scaled_position #MARC MARC #H3GA H3GA #MARC MARC #CASI CASI #H3GA H3GA #ASGA ASGA summary(acc_out$snp_measures[,2]) # Min. 1st Qu. Median Mean 3rd Qu. Max. # The sixth object returned by accuracy_summary is acc_mat, which is a data-frame of the proportion of 9

10 correctly imputed alleles for each genotype. Rows correspond to SNP and columns to individuals. dim(acc_out$acc_mat) # [1] acc_out$acc_mat[1:5,1:5] # MARC H3GA MARC CASI H3GA #York_Sid_ #York_Sid_ #York_Sid_ #York_Sid_ #York_Sid_ Visualization of imputation accuracy 312 In this section we provide code to obtain the figures published in Badke et al. [1]: Accuracy of Imputation by the scaled chromosomal location of imputed SNP To investigate whether there is a difference in SNP wise imputation accuracy as a function of chromosomal location we plot the estimated accuracy by the scaled location of each SNP. This plot can be build from the output data obtained from the accuracy_summary function that we applied in 3.2. The object acc_out contains the results of accuracy_summary. In addition, we added the weighted mean average and the overall average accuracy to the plot. The graphical output can be seen in Figure 1. # opening the pdf to which the plot will be written pdf("accuracy_by_density.pdf") # obtaining the loess smoother to plot the weighted mean average pred<-loess(acc_out$snp_acc[,2]~acc_out$snp_measures[,3]) # open a plot window of the right dimensions # the accuracy/scaled location are taken from the acc_out object rendered in 3.2 plot(acc_out$snp_acc[,2]~acc_out$snp_measures[,3], main="accuracy of Imputation by the scaled chromosomal location of imputed SNP", xlab="scaled chromosome position", ylab="imputation accuracy", ylim=c(0,1)) # insert a horizontal line representing the average accuracy abline(h=mean(acc_out$snp_acc[,2]), col="green") # inserting the weighted mean average estimated using a loess smoother points(pred$x, pred$fitted, col="red", pch=18) dev.off() Accuracy of imputation by MAF of the SNP Figures 2 contains a plot of SNP wise imputation accuracy as a function of MAF, estimated as the square correlation between observed and imputed allelic dosage. Estimates of accuracy and MAF used in this plot can be obtained from the objects available in acc_out obtained in section 3.2. In addition, we added the weighted mean average accuracy into the plot to assess if there is an obvious pattern of accuracy across all minor allele frequencies. # obtain color coding by density - darker color=more data density in that area 10

11 Accuracy of Imputation by the scaled chromosomal location of imputed SNP Scaled chromosome position Imputation accuracy Figure 1: Accuracy of Imputation by the scaled chromosomal location of imputed SNP colors<-denscols(acc_out$snp_acc[,3]~acc_out$snp_measures[,2]) 340 pdf("accuracy_by_maf.pdf") 341 # obtaining the loess smoother to plot the weighted mean average 342 pred<-loess(acc_out$snp_acc[,3]~acc_out$snp_measures[,2]) 343 # open a plot window of the right dimensions 344 # the accuracy/scaled location are taken from the acc_out object rendered in plot(acc_out$snp_acc[,3]~acc_out$snp_measures[,2], 346 main="accuracy of Imputation by SNP MAF", 347 xlab="maf", ylab=expression( Accuracy R ^2), 348 ylim=c(0,1), pch=20, col=colors) 349 # inserting the weighted mean average estimated using a loess smoother 350 points(pred$x, pred$fitted, col="red", pch=18) 351 dev.off() Accuracy of Imputation by SNP MAF MAF Accuracy R 2 Figure 2: Accuracy of Imputation by SNP MAF 3. Accuracy of Imputation by tagsnp density and selection method 353 In the paper accompanying this package several methods for tagsnp selection were compared across a

12 variety of tagsnp densities [1]. Since only one density of tagsnp was explored in the example above we have provided a small table with imputation accuracy for several densities of tagsnp estimated for all three methods of tagsnp selection, with the corresponding graphical output shown in Figure 3: # Accuracy of imputation by tagsnp density data("acc_table") head(acc_table) # Number.of.SNP r2.threshold FESTA BEAGLE n_even evenly.spaced # # # # NA # NA # NA pdf("accuracy_by_tagsnpdensity.pdf") tab<-acc_table # open an empty plot with the correct dimensions plot(0, pty="n", main="accuracy by density", xlab="number of tagsnp", ylab="accuracy of Imputation", ylim=c(0,1), xlim=c(0,max(tab[,1])+50)) # add points for the results of statistical tagsnp selection points(tab[,3]~tab[,1], type="p", col="black", pch=19) # add points for the results of predictive tagsnp selection points(tab[,6]~tab[,5], type="p", col="red", pch=15) # add points for the results of evenly spaced tagsnp points(tab[,4]~tab[,1], type="p", col="darkgreen", pch=17) # add a legend legend(x="bottomright", pch=c(19, 15, 17), bty="n", legend=c("statistical selection", "evenly spaced", "predictive selection"), col=c("black","red","darkgreen")) dev.off() Accuracy by density Accuracy of Imputation statistical selection evenly spaced predictive selection number of tagsnp Figure 3: Accuracy of Imputation by tagsnp density and selection method 384 This code can be adjusted to obtain figures similar to Figures 1 and 2 in Badke et. al [1] Accuracy of imputation by reference panel size Badke et al. [1] also investigated the effect of increasing the number of reference haplotypes. To obtain 12

13 a larger reference panel the available 889 Yorkshire sires split into 200 validation animals and 689 animals that were added to stepwise increase the number of reference haplotypes. Since the example detailed in 3.2 only included one imputation we provided a file containing imputation accuracy for a random sample of 2000 SNP for a variety of reference panels (accuracy_by_ref_size.txt). data(ref_size) # extracting the number of reference animals from column names n_ref<-as.numeric(sub("x", "",colnames(ref_size[,2:7])))*2 # average accuracy for reference panel sizes - each column corresponds to a panel size avg_acc<-colmeans(ref_size[,2:7]) pdf("accuracy_by_refsize.pdf") # plot accuracy by number of reference haplotypes plot(avg_acc~n_ref, type="p", main="accuracy by Reference panel size", ylab=expression( Accuracy R ^2), xlab="number of reference haplotypes", xlim=c(0,max(n_ref)), ylim=c(0,1), pch=20) dev.off() Accuracy by Reference panel size Accuracy R Number of reference haplotypes Figure 4: Accuracy of Imputation by reference panel size Supplementary Figures 1 & 2 The supplementary Figures 1 and 2 provided by Badke et al. [1] show the weighted mean average accuracy as a function of MAF and the chromosomal location for a variety of reference panel sizes, to illustrate how increasing the number of reference haplotypes affects overall SNP accuracy, but especially imputation accuracy of SNP with previously below average accuracy. Example code to obtain a figure of that particular type can be found below for the graphical output that can be see in Figure 5 pdf("accuracy_by_refsize_weighted.pdf") # specifying colors for all sizes of the reference panel cols<-c("black", "red", "blue", "orange", "magenta", "darkgreen") # open an empty plot of the correct dimensions plot(0, type="n", main="accuracy of Imputation by increasing reference panel size", ylab=expression( Accuracy R ^2), xlab="scaled chromosomal location", ylim=c(0,1), xlim=c(0,1)) 13

14 # adding all 7 columns as points to the plot for (i in 1:length(n_ref)) { # estimating the weighted mean average using a loess smoother pred<-loess(ref_size[,i+1]~ref_size[,8]) # adding the points points(pred$x, pred$fitted, col=cols[i], pch=18, cex=0.25) } # including a legend to the plot legend(x="bottomright", pch=18, bty="n", legend=paste(n_ref, " reference haplotypes", sep= ), col=cols) dev.off() Accuracy of Imputation by increasing reference panel size Accuracy R reference haplotypes 256 reference haplotypes 512 reference haplotypes 1024 reference haplotypes 1200 reference haplotypes 1378 reference haplotypes scaled chromosomal location Figure 5: Accuracy of Imputation by increasing reference panel size 425 References [1] Yvonne M Badke, Ronald O Bates, Catherine W Ernst, Clint Schwab, Justin Fix, and Juan P Steibel. TagSNP selection and imputation accuracy using a reduced size haplotype panel in swine. submitted, [2] Brian L Browning. Documentation of BEAGLE 3.3.1, [3] Brian L Browning and Sharon R Browning. A Unified Approach to Genotype Imputation and Haplotype-Phase Inference for Large Data Sets of Trios and Unrelated Individuals. Am J Hum Genet, 84(2): , January [4] B. J. Hayes, P. J. Bowman, H. D. Daetwyler, J. W. Kijas, and J. H. J. van der Werf. Accuracy of genotype imputation in sheep breeds. Anim Genet, 43(1):72 80, February [5] John M. Hickey, Jose Crossa, Raman Babu, and Gustavo de los Campos. Factors Affecting the Accuracy of Genotype Imputation in Populations from Several Maize Breeding Programs. Crop Science, 52(2):654, [6] Z S Qin, S Gopalakrishnan, and G R Abecasis. An efficient comprehensive search algorithm for tagsnp selection using linkage disequilibrium criteria. Bioinformatics, 22(2): , January

15 438 [7] The R Development Core Team. R: A language and environment for statistical computing. Vienna, Austria, [8] Valentin Wimmer, Theresa Albrecht, Hans-Jürgen Auinger, and Chris-Carolin Schön. synbreed: A framework for the analysis of genomic prediction data using R. Bioinformatics, 28(15):2086 7,

Accuracy of imputation using the most common sires as reference population in layer chickens

Heidaritabar et al. BMC Genetics (2015) 16:101 DOI 10.1186/s12863-015-0253-5 RESEARCH ARTICLE Open Access Accuracy of imputation using the most common sires as reference population in layer chickens Marzieh