Schematic diagram of the pig haplotype reference panel’s construction, imputation accuracy evaluation, implementation platform and applications.A: Data resources and processing steps used to construct the PHARP. B: Imputation accuracy estimation of PHARP on multiple test datasets. C: Imputation platform development. D: Applications of PHARP in GWASs, GS and other potential studies such as eQTL mapping and TWASs.
Imputation accuracy under different scenarios. A: Mimicing three popular pig commercial chips (50K, 60K, and 80K) using three datasets by masking all variants (only autosomes were used) except those on the chips; the held-out genotypes were considered as ‘real’ to calculate the CR and r2 values. B: Boxplot of imputation accuracy estimated by mimicking the target panel with different densities of SNPs on chromosome 1 using test datasets 1, 2 and 3. C: Boxplot of the imputation accuracy estimated by mimicking 50K chip genotypes from dataset 1 using different sizes of reference panels constructed by randomly extracting samples from 1006 individuals (repeated 5 times). D: Mimicking the 50K chip genotypes from dataset 1 and 2 and using reference panels constructed by extracting samples according to pig breed (LW, Large White, n = 114; DU, Duroc, n = 85). E: The imputation accuracies of the different MAF bins ((0, 0.02], (0.02, 0.05], (0.05, 0.1], (0.1, 0.2], (0.2, 0.3], (0.4 0.5]) estimated by mimicking the 50K chip genotypes using dataset 1. F: The imputation accuracy estimated from dataset 4 using our reference panel and that from Animal-ImputeDB. Dataset 1, Large White pig breed, LW, n = 81; dataset 2, Duroc pig breed, DU, n = 299; dataset 3, Jiaxinghei pig breed, JXH, n = 54; dataset 4, Duroc pig breed, n = 20, pigs were genotyped by both a 50K chip and ELC.
Association signals for growth phenotypes before and after imputation. Association test statistics on the −log10 (P-value) scale (y-axis) are plotted for each SNP position (x-axis) for the trait of backfat thickness at an age of 180 days (A), from Zhang et al., and at 100 kg (B), from Fu et al. To simplify the plot, only the variants with a P-value less than 1.08×10-4 are shown, and they are colored according to the annotated genes. The black-labeled genes are reported in the original paper, and the blue-labeled genes are novel genes detected after imputation. Examples of potential causal variants (marked by blue asterisks) in the SNRPC (C), GRM4 (D) and PACSIN1 (E) genes. Each dot represents a variant, whose LD (r2) with the Chip SNP (marked by blue diamonds) or the one with the lowest P-value (marked by a black circle) is indicated by the colour of the dot. The two horizontal lines divide SNPs with P-values < 2.05×10-6 and <1.08×10-4 (A), and P-values < 6.46×10-7 and <1.86×10-5 (B).