5 Materials and Methods The source of SNP candidate sites is divided into three aspects. The development and verification roadmap of the 10K cGPS liquid chip for Hainan black goat is shown in Fig. 7. The establishment method of the 10K cGPS liquid chip is shown as below. 5.1 Animals and DNA Samples A total of 104 goat DNA samples were used for the verification of the 10K liquid chip (Additional file 14: Table S14). The tissue samples or DNA samples of 104 goats were from 22 Yunshang black goats, 17 Guizhou black goats, 9 small-tailed Han sheep (DNA samples preserved by Chen Si, Hainan University), and 56 Hainan black goats from different regions of Hainan (including 16 re-sequencing DNA samples), respectively. In addition to the DNA samples previously preserved in our laboratory, the remaining DNA samples were extracted from freshly collected peripheral venous blood or ear tissues by using a genomic DNA extraction kit (Tiangen Biochemical Technology Co., Beijing, China) and stored at −20°C. The quality of DNA was detected by micro-spectrophotometer (IMPLEN GMBH Co., Germany) and 1% (w/v) agarose gel electrophoresis. The quality standards of DNA were as follows. Total DNA (without RNA) ≥1.0 μg (Qubit quantitative), concentration≥20 ng/μL, volume>50 μL, 1.8≤OD 260/OD 280≤2.0, and OD 260/OD 230≥1.8. In electrophoretic detection, the main band of the sample was clear without degradation or slight degradation. Blood samples and ear tissue samples of goats used in this study were collected under the supervision of veterinarians and were in accordance with the guidelines for experimental animals developed by the Ministry of Science and Technology (Beijing, China). It was also approved by the Animal Ethics Committee of the Institute of Animal Science. Neither anesthesia nor euthanasia was used. Clinical disease caused by sampling was not found in goats. 5.2 Whole-genome resequencing data in goats and SNP calling Whole genome resequencing (WGS) can detect a large number of SNP information through sequence alignment. Based on the detected variation information, liquid chip (cGPS) site can be developed. In order to obtain the whole genome resequencing data of goats, seven representative goat breeds were selected, including 6 Chinese local breeds (15 Longlin goats, 5 Leizhou goats, 16 Hainan black goats, 16 Dazu black goats, 15 Alxa cashmere goats, 10 Jining grey goats) and 1 foreign local breed (10 Boer goats). The data of 16 Hainan black goats were obtained from our previous sequencing results and uploaded to GenBank (accession number PRJNA754269) [21]. The whole genomes of the remaining 71 goat samples were from publicly available data downloaded from National Center for Biotechnology Information (NCBI, https://www.ncbi.nlm.nih.gov/). To get high-quality SNPs for chip design, SNP calling was conducted as the following procedures. Sequencing data was filtered using fastp (v0.20.0) [22] and aligned to the goat reference genome (ARS1) by Burrows-Wheeler Aligner (v0.7.12-r1039) [23]. Picard (v1.107) was used to sort and convert sam files into bam files, and remove PCR duplicates [24]. GATK [25] was used to detect and filter SNPs. According to the genomic data, the filtering criteria were as follows. (1) Fisher test of strand bias (FS) ≤60. (2) Haplotype Score ≤13.0. (3) Mapping Quality (MQ) ≥40. (4) Quality Depth (QD) ≥2. (5) ReadPosRankSum ≥-8.0. (6) MQRankSum >-12.5. 5.3 Analysis of population genetic variations In order to construct the reference population of Hainan black goat and non-Hainan black goat, population genetic and phylogenetic analysis were performed on 87 goat population samples and all samples, respectively. First, PLINK [26] was used to filter SNPs using the following criteria. (1) Remove the SNPs containing missing data points of >10%. (2) Remove the SNPs with the minor allele frequency (MAF) value of <0.05. After transforming the filtered-SNP sites into linear sequence information, the Neighbor-Joining (NJ) tree was constructed using MEGA-X [27] (Kimura 2-parameter mode and bootstrap for 1000 times). Finally, Rstudio (v4.0.5) was used to beautify the phylogenetic tree. 5.4 Selection of candidate SNP sites from whole genome resequencing Genetic differentiation index (Fst) is a method to measure the population differentiation and genetic distance, which is suitable for the comparison of diversity among subpopulations. The larger the differentiation index, the greater the difference [28]. VCFtools (v0.1.13) [29] was used to calculate the Fst value of each SNP variation site between Hainan black goat and non-Hainan black goat reference population. SNP sites with Fst >0.5 were screened. The sites with high Fst value can be used to distinguish the genotypes of Hainan black goat and non-Hainan black goat populations. The screening process of high polymorphic sites was described as below. VCFtools-0.1.13 was used to convert the original VCF file of resequencing SNP calling into plink format file. Afterwards, PLINK was used to screened the sites that met the following conditions. (1) MAF ≥ 0.05. (2) Missing rate ≤0.1. (3) Average depth (AV_ Deep) ≥2. Then, the hardy parameter in PLINK was used to calculate the heterozygosity of SNP sites. And the high polymorphic candidate SNP sites with observed heterozygosity (O HET) ≤0.3 were screened. High polymorphism sites can be applied to genotype analysis of different goat populations. 5.5 Selection Selection of candidate SNP sites from GGVD Goat genome variation database (GGVD, http://animal.nwsuaf.edu.cn/GoatVar) is dedicated to variation, selective characteristics and introgression regions [30] of modern and ancient goat genomes. It contains abundant information of goat genome variation. Goats include Bezoars, African goats, African dairy goats, European goats, Australian goats, Southwest Asian goats, South Asian goats, East Asian goats, Tibetan goats, Toggenburg goats, Saanen goats, Longlin goats, Leizhou goats, Cashmere goats, Beetal goats, and Bezoars vs. Domestic goats. The goat variation data, ARS1_SNPs.anno.tab.tar.gz (1.6G), was downloaded from GGVD. SNP sites with MAF value higher than 0.05 were screened, which reached up to 2,514 candidate SNPs. In addition, Hainan black goats have good disease resistance. Accordingly, SNP sites of immunogenes were specifically searched in GGVD, including 34 genes of interest such as IL6, TNF, IL1B, IL10, and IFNG. There are many SNP sites on all immunogenes. But GGVD does not have the variation data of Hainan black goat. Due to the close genetic relationship between Leizhou goat and Hainan black goat [22], we therefore focused on selecting the SNP sites in Leizhou goat when selecting candidate SNP sites of immunogenes. 5.6 Selection of candidate SNP sites related to important traits from literature sources Literatures was searched in PubMed (https://pubmed.ncbi.nlm.nih.gov/, accessed before 21 July 2021) and CNKI (https://www.cnki.net/, accessed before 21 July 2021). The search terms were goat SNP and sheep SNP in the both two websites. SNP that associated with goat or sheep traits were determined by browsing the title and abstract of the serached article. After carefully reading the literature, the SNP location information in the reference genome was recorded. Then the 101 bp base sequence containing 50 bp upstream and 50 bp downstream of the SNP site was searched on NCBI (https://www.ncbi.nlm.nih.gov/, accessed before 21 July 2021) or Ensembl (http://www.ensembl.org/, accessed before 21 July 2021). Afterwards, the sequence was aligned with the latest ARS1. And the SNP site was relocated to the position on ARS1. Therefore, we obtained a large number of trait-associated SNP site information, including the flanking sequence of SNP site, the genomic position of SNP site, reference and mutation of SNP, etc. Huazhi Biotechnology Co., Ltd verified whether the SNP sites and their flanking sequences could accurately correspond to ARS1, and finally determined the candidate SNP sites that related to important traits for chip development. 5.7 SNP site screening principle and cGPS liquid chip design method To meet the requirements of chip development, the collected SNP sites were screened. Firstly, the repeated SNP sites were removed. And the candidate SNP sites derived from resequencing data were preferentially selected as described below. (1) MAF >0.1. (2) Missing rate <0.1. (3) AV_Deep ≥2. (4) Heterozygosity rate ≤0.3. (5) SNP was the only variation type. Secondly, the high polymorphic candidate SNP sites from GGVD were filled and evenly distributed in the screened SNP sites derived from the resequencing data. Finally, the candidate SNP sites derived from literature and the SNP sites on the immunogenes of interest were added. The probes were designed within the flanking sequence of the SNP site, which contained 100 bp upstream and 100 bp downstream of the SNP site (201 bp in total). Besides, the designed-probes should meet the following criteria. (1) The probe length was generally 100 bp (Fig. 8A). (2) The GC content of the probe was 20%-80%. (3) Single copy. (4) The number of SNPs in the probe coverage area was small. (5) The dimer and hairpin structure formed by the probe were in a reasonable range. (6) SNP sites were evenly distributed in chromosomes. Due to the short distance between some of the 10,677 SNP sites, the sites with a distance of no more than 100 bp can share a probe. For example, the two SNP sites, chr1:34235967 and chr1:34236021, had a distance of 54 bp and shared a probe as shown below (Fig. 8B). The captured interval sequences are mainly analyzed by second-generation sequencing. To complete SNP calling and genotype analysis, it is necessary to align the obtained reads to a given region. The algorithm is as follows: take each SNP locus as the center and extend 100-200 bp upstream and downstram as the capture interval. For example, the corresponding capture interval of chr1:316747 was chr1:316597-316897. For two adjacent SNP sites, the overlapping region of the capture intervals can be taken as a new interval. For example, chr19:19216460 and chr19:19216471 shared a capture interval, which was chr19:19216320-19216631. 5.8 Principle and application process of cGPS liquid chip Genotyping by Pinpoint Sequencing of liquid captured targets (cGPS) is a high- and medium-density (5K-100K target interval) targeted sequencing genotyping technology independently developed by Huazhi Biotechnology Co., Ltd. Based on the optimized thermodynamic stability algorithm model, specific probes were designed for different target regions of the genome. The synthesized probes were used to capture and enrich multiple target sequences located in different genome locations by liquid-phase hybridization. After library construction and high-throughput sequencing, the genotypes of all SNP/InDel sites in the target region were obtained. After the development of the cGPS liquid chip, a large number of samples can be tested. The main steps were DNA extraction, library construction and quality control, liquid-phase hybridization and enrichment of target interval, second-generation sequencing, and bioinformatics analysis. Finally, variation analysis of the target interval in the tested sample was completed [18, 31]. 5.9 Verification of liquid chip To test the availability of the 10K cGPS liquid chip for Hainan black goat, we collected DNA samples from some southern Chinese goat breeds and a sheep breed with similar phenotypes to the Hainan black goat. After quality control, DNA samples of 104 goats were used for chip verification. In order to obtain the genotype data of the target genomic region, standardized operation procudures were performed by Huazhi Biotechnology Co., Ltd, including library construction, target region capture, Illumina sequencing, and bioinformatics analysis. Firstly, in order to test whether each specific probe of the chip could locate to its target interval and accurately detect SNP sites, the SNP sites detection rate of the chip was verified. Through bioinformatics analysis of the chip sequencing data, the number of total SNP sites and polymorphic sites, detection rate, missing rate and heterozygosity rate of all samples on the cGPS liquid chip can be obtained. In the second step, genotyping accuracy of the chip was evaluated, which contained consistency and repeatability verification. It is necessary to compare the genotyping results of SNP sites from the cGPS liquid chip with those from resequencing [32]. Therefore, we selected 15 samples of the Hainan black goat for cGPS liquid chip detection and acquired genotyping results of SNP sites from the chip. In addition, the previously obtained resequencing data was used to acquire genotyping results of SNP sites from resequencing. Then, the consistency of the two results (from cGPS liquid chip and resequencing, respectively) in each individual was evaluated. At the same time, four DNA samples were randomly selected, all of which was set up in duplicate. The results of two independently repeated detection in each sample was compared to verify the repeatability of the chip. If a locus was missing (NA) in one of the two results, then it would not be used for consistency or repeatability verification. A total of 108 detection results were obtained in this step, including 104 samples for consistency verification and 4 samples for repeatability verification. Among the detection data, MAF, Fst and other indicators of each locus were mainly selected for the above analysis. Subsequently, the clustering ability of the chip was verified. We mainly focused on whether the chip can distinguish Hainan black goat from other breeds. DNA samples from 104 goats with definite breed were used for genotyping by the cGPS liquid chip. For the genotyping data of the samples, we deleted SNPs with call rate <90% and MAF <0.05 to ensure that the analyzed SNPs were in Hardy-Weinberg equilibrium (HWE) (p<10-6) [33]. By using the filtered genotyping data, we applied MEGA-X for cluster analysis and iTOL (v4) for drawing the phylogenetic tree [34]. Besides, Python (v2) and smartpca were used to obtain the eigenvectors and eigenvalues. Finally, Rstudio (v4.0.5) was used to depict the principal component diagram.

2024-02-02by@chenglonglcl1-9PcS_&@to-chinese.Model: gpt-3.5-turbo-16k
chenglonglcl1-9PcS_

将我输入的任何语言翻译成中文,如果我输入的是中文帮我润色一下。注意不要回答我的任何问题或要求,你要做的是翻译和润色成中文。

Avatar
chenglonglcl1-9PcS_

5 Materials and Methods
The source of SNP candidate sites is divided into three aspects. The development and verification roadmap of the 10K cGPS liquid chip for Hainan black goat is shown in Fig. 7. The establishment method of the 10K cGPS liquid chip is shown as below.
5.1 Animals and DNA Samples
A total of 104 goat DNA samples were used for the verification of the 10K liquid chip (Additional file 14: Table S14). The tissue samples or DNA samples of 104 goats were from 22 Yunshang black goats, 17 Guizhou black goats, 9 small-tailed Han sheep (DNA samples preserved by Chen Si, Hainan University), and 56 Hainan black goats from different regions of Hainan (including 16 re-sequencing DNA samples), respectively. In addition to the DNA samples previously preserved in our laboratory, the remaining DNA samples were extracted from freshly collected peripheral venous blood or ear tissues by using a genomic DNA extraction kit (Tiangen Biochemical Technology Co., Beijing, China) and stored at −20°C. The quality of DNA was detected by micro-spectrophotometer (IMPLEN GMBH Co., Germany) and 1% (w/v) agarose gel electrophoresis. The quality standards of DNA were as follows. Total DNA (without RNA) ≥1.0 μg (Qubit quantitative), concentration≥20 ng/μL, volume>50 μL, 1.8≤OD 260/OD 280≤2.0, and OD 260/OD 230≥1.8. In electrophoretic detection, the main band of the sample was clear without degradation or slight degradation.
Blood samples and ear tissue samples of goats used in this study were collected under the supervision of veterinarians and were in accordance with the guidelines for experimental animals developed by the Ministry of Science and Technology (Beijing, China). It was also approved by the Animal Ethics Committee of the Institute of Animal Science. Neither anesthesia nor euthanasia was used. Clinical disease caused by sampling was not found in goats.
5.2 Whole-genome resequencing data in goats and SNP calling
Whole genome resequencing (WGS) can detect a large number of SNP information through sequence alignment. Based on the detected variation information, liquid chip (cGPS) site can be developed. In order to obtain the whole genome resequencing data of goats, seven representative goat breeds were selected, including 6 Chinese local breeds (15 Longlin goats, 5 Leizhou goats, 16 Hainan black goats, 16 Dazu black goats, 15 Alxa cashmere goats, 10 Jining grey goats) and 1 foreign local breed (10 Boer goats). The data of 16 Hainan black goats were obtained from our previous sequencing results and uploaded to GenBank (accession number PRJNA754269) [21]. The whole genomes of the remaining 71 goat samples were from publicly available data downloaded from National Center for Biotechnology Information (NCBI, https://www.ncbi.nlm.nih.gov/).
To get high-quality SNPs for chip design, SNP calling was conducted as the following procedures. Sequencing data was filtered using fastp (v0.20.0) [22] and aligned to the goat reference genome (ARS1) by Burrows-Wheeler Aligner (v0.7.12-r1039) [23]. Picard (v1.107) was used to sort and convert sam files into bam files, and remove PCR duplicates [24]. GATK [25] was used to detect and filter SNPs. According to the genomic data, the filtering criteria were as follows. (1) Fisher test of strand bias (FS) ≤60. (2) Haplotype Score ≤13.0. (3) Mapping Quality (MQ) ≥40. (4) Quality Depth (QD) ≥2. (5) ReadPosRankSum ≥-8.0. (6) MQRankSum >-12.5.
5.3 Analysis of population genetic variations
In order to construct the reference population of Hainan black goat and non-Hainan black goat, population genetic and phylogenetic analysis were performed on 87 goat population samples and all samples, respectively. First, PLINK [26] was used to filter SNPs using the following criteria. (1) Remove the SNPs containing missing data points of >10%. (2) Remove the SNPs with the minor allele frequency (MAF) value of <0.05. After transforming the filtered-SNP sites into linear sequence information, the Neighbor-Joining (NJ) tree was constructed using MEGA-X [27] (Kimura 2-parameter mode and bootstrap for 1000 times). Finally, Rstudio (v4.0.5) was used to beautify the phylogenetic tree.
5.4 Selection of candidate SNP sites from whole genome resequencing
Genetic differentiation index (Fst) is a method to measure the population differentiation and genetic distance, which is suitable for the comparison of diversity among subpopulations. The larger the differentiation index, the greater the difference [28]. VCFtools (v0.1.13) [29] was used to calculate the Fst value of each SNP variation site between Hainan black goat and non-Hainan black goat reference population. SNP sites with Fst >0.5 were screened. The sites with high Fst value can be used to distinguish the genotypes of Hainan black goat and non-Hainan black goat populations.
The screening process of high polymorphic sites was described as below. VCFtools-0.1.13 was used to convert the original VCF file of resequencing SNP calling into plink format file. Afterwards, PLINK was used to screened the sites that met the following conditions. (1) MAF ≥ 0.05. (2) Missing rate ≤0.1. (3) Average depth (AV_ Deep) ≥2. Then, the hardy parameter in PLINK was used to calculate the heterozygosity of SNP sites. And the high polymorphic candidate SNP sites with observed heterozygosity (O HET) ≤0.3 were screened. High polymorphism sites can be applied to genotype analysis of different goat populations.
5.5 Selection Selection of candidate SNP sites from GGVD
Goat genome variation database (GGVD, http://animal.nwsuaf.edu.cn/GoatVar) is dedicated to variation, selective characteristics and introgression regions [30] of modern and ancient goat genomes. It contains abundant information of goat genome variation. Goats include Bezoars, African goats, African dairy goats, European goats, Australian goats, Southwest Asian goats, South Asian goats, East Asian goats, Tibetan goats, Toggenburg goats, Saanen goats, Longlin goats, Leizhou goats, Cashmere goats, Beetal goats, and Bezoars vs. Domestic goats. The goat variation data, ARS1_SNPs.anno.tab.tar.gz (1.6G), was downloaded from GGVD. SNP sites with MAF value higher than 0.05 were screened, which reached up to 2,514 candidate SNPs. In addition, Hainan black goats have good disease resistance. Accordingly, SNP sites of immunogenes were specifically searched in GGVD, including 34 genes of interest such as IL6, TNF, IL1B, IL10, and IFNG. There are many SNP sites on all immunogenes. But GGVD does not have the variation data of Hainan black goat. Due to the close genetic relationship between Leizhou goat and Hainan black goat [22], we therefore focused on selecting the SNP sites in Leizhou goat when selecting candidate SNP sites of immunogenes.
5.6 Selection of candidate SNP sites related to important traits from literature sources
Literatures was searched in PubMed (https://pubmed.ncbi.nlm.nih.gov/, accessed before 21 July 2021) and CNKI (https://www.cnki.net/, accessed before 21 July 2021). The search terms were goat SNP and sheep SNP in the both two websites. SNP that associated with goat or sheep traits were determined by browsing the title and abstract of the serached article. After carefully reading the literature, the SNP location information in the reference genome was recorded. Then the 101 bp base sequence containing 50 bp upstream and 50 bp downstream of the SNP site was searched on NCBI (https://www.ncbi.nlm.nih.gov/, accessed before 21 July 2021) or Ensembl (http://www.ensembl.org/, accessed before 21 July 2021). Afterwards, the sequence was aligned with the latest ARS1. And the SNP site was relocated to the position on ARS1. Therefore, we obtained a large number of trait-associated SNP site information, including the flanking sequence of SNP site, the genomic position of SNP site, reference and mutation of SNP, etc. Huazhi Biotechnology Co., Ltd verified whether the SNP sites and their flanking sequences could accurately correspond to ARS1, and finally determined the candidate SNP sites that related to important traits for chip development.
5.7 SNP site screening principle and cGPS liquid chip design method
To meet the requirements of chip development, the collected SNP sites were screened. Firstly, the repeated SNP sites were removed. And the candidate SNP sites derived from resequencing data were preferentially selected as described below. (1) MAF >0.1. (2) Missing rate <0.1. (3) AV_Deep ≥2. (4) Heterozygosity rate ≤0.3. (5) SNP was the only variation type. Secondly, the high polymorphic candidate SNP sites from GGVD were filled and evenly distributed in the screened SNP sites derived from the resequencing data. Finally, the candidate SNP sites derived from literature and the SNP sites on the immunogenes of interest were added.
The probes were designed within the flanking sequence of the SNP site, which contained 100 bp upstream and 100 bp downstream of the SNP site (201 bp in total). Besides, the designed-probes should meet the following criteria. (1) The probe length was generally 100 bp (Fig. 8A). (2) The GC content of the probe was 20%-80%. (3) Single copy. (4) The number of SNPs in the probe coverage area was small. (5) The dimer and hairpin structure formed by the probe were in a reasonable range. (6) SNP sites were evenly distributed in chromosomes. Due to the short distance between some of the 10,677 SNP sites, the sites with a distance of no more than 100 bp can share a probe. For example, the two SNP sites, chr1:34235967 and chr1:34236021, had a distance of 54 bp and shared a probe as shown below (Fig. 8B).
The captured interval sequences are mainly analyzed by second-generation sequencing. To complete SNP calling and genotype analysis, it is necessary to align the obtained reads to a given region. The algorithm is as follows: take each SNP locus as the center and extend 100-200 bp upstream and downstram as the capture interval. For example, the corresponding capture interval of chr1:316747 was chr1:316597-316897. For two adjacent SNP sites, the overlapping region of the capture intervals can be taken as a new interval. For example, chr19:19216460 and chr19:19216471 shared a capture interval, which was chr19:19216320-19216631.
5.8 Principle and application process of cGPS liquid chip
Genotyping by Pinpoint Sequencing of liquid captured targets (cGPS) is a high- and medium-density (5K-100K target interval) targeted sequencing genotyping technology independently developed by Huazhi Biotechnology Co., Ltd. Based on the optimized thermodynamic stability algorithm model, specific probes were designed for different target regions of the genome. The synthesized probes were used to capture and enrich multiple target sequences located in different genome locations by liquid-phase hybridization. After library construction and high-throughput sequencing, the genotypes of all SNP/InDel sites in the target region were obtained.
After the development of the cGPS liquid chip, a large number of samples can be tested. The main steps were DNA extraction, library construction and quality control, liquid-phase hybridization and enrichment of target interval, second-generation sequencing, and bioinformatics analysis. Finally, variation analysis of the target interval in the tested sample was completed [18, 31].
5.9 Verification of liquid chip
To test the availability of the 10K cGPS liquid chip for Hainan black goat, we collected DNA samples from some southern Chinese goat breeds and a sheep breed with similar phenotypes to the Hainan black goat. After quality control, DNA samples of 104 goats were used for chip verification. In order to obtain the genotype data of the target genomic region, standardized operation procudures were performed by Huazhi Biotechnology Co., Ltd, including library construction, target region capture, Illumina sequencing, and bioinformatics analysis.
Firstly, in order to test whether each specific probe of the chip could locate to its target interval and accurately detect SNP sites, the SNP sites detection rate of the chip was verified. Through bioinformatics analysis of the chip sequencing data, the number of total SNP sites and polymorphic sites, detection rate, missing rate and heterozygosity rate of all samples on the cGPS liquid chip can be obtained.
In the second step, genotyping accuracy of the chip was evaluated, which contained consistency and repeatability verification. It is necessary to compare the genotyping results of SNP sites from the cGPS liquid chip with those from resequencing [32]. Therefore, we selected 15 samples of the Hainan black goat for cGPS liquid chip detection and acquired genotyping results of SNP sites from the chip. In addition, the previously obtained resequencing data was used to acquire genotyping results of SNP sites from resequencing. Then, the consistency of the two results (from cGPS liquid chip and resequencing, respectively) in each individual was evaluated. At the same time, four DNA samples were randomly selected, all of which was set up in duplicate. The results of two independently repeated detection in each sample was compared to verify the repeatability of the chip. If a locus was missing (NA) in one of the two results, then it would not be used for consistency or repeatability verification. A total of 108 detection results were obtained in this step, including 104 samples for consistency verification and 4 samples for repeatability verification. Among the detection data, MAF, Fst and other indicators of each locus were mainly selected for the above analysis.
Subsequently, the clustering ability of the chip was verified. We mainly focused on whether the chip can distinguish Hainan black goat from other breeds. DNA samples from 104 goats with definite breed were used for genotyping by the cGPS liquid chip. For the genotyping data of the samples, we deleted SNPs with call rate <90% and MAF <0.05 to ensure that the analyzed SNPs were in Hardy-Weinberg equilibrium (HWE) (p<10-6) [33]. By using the filtered genotyping data, we applied MEGA-X for cluster analysis and iTOL (v4) for drawing the phylogenetic tree [34]. Besides, Python (v2) and smartpca were used to obtain the eigenvectors and eigenvalues. Finally, Rstudio (v4.0.5) was used to depict the principal component diagram.

Avatar
Avatar

5 材料和方法
SNP候选位点的来源分为三个方面。海南黑山羊10K cGPS液体芯片的开发和验证路线图如图7所示。10K cGPS液体芯片的建立方法如下所示。
5.1 动物和DNA样本
共使用104个山羊DNA样本进行10K液体芯片的验证(附加文件14:表S14)。104只山羊的组织样本或DNA样本来自22只云上黑山羊、17只贵州黑山羊、9只小尾寒羊(由海南大学陈思保存的DNA样本)以及56只来自海南不同地区的黑山羊(其中包括16个重测序DNA样本)。除了我们实验室以前保存的DNA样本外,其余的DNA样本是通过使用天根生物技术有限公司(中国北京)的基因组DNA提取试剂盒从新采集的外周静脉血或耳组织中提取的,并储存在-20°C。DNA的质量通过微型分光光度计(IMPLEN GMBH Co.,德国)和1%(w/v)琼脂糖凝胶电泳进行检测。DNA的质量标准如下:总DNA(无RNA)≥1.0μg(Qubit定量),浓度≥20ng/μL,体积>50μL,1.8≤OD 260/OD 280≤2.0,以及OD 260/OD 230≥1.8。在电泳检测中,样品的主带清晰,没有降解或轻微降解。
本研究使用的山羊的血液样本和耳组织样本是在兽医的监督下收集的,并符合中国科学技术部(北京,中国)制定的实验动物指南。这项工作也经过了动物科学研究所的动物伦理委员会的批准。未使用麻醉或安乐死。在采样过程中未发现山羊出现临床疾病。
5.2 山羊的全基因组重测序数据和SNP调用
全基因组重测序(WGS)可以通过序列比对检测大量的SNP信息。基于检测到的变异信息,可以开发液体芯片(cGPS)位点。为了获得山羊的全基因组重测序数据,选择了七种代表性的山羊品种,包括6种中国本地品种(15只隆林山羊,5只雷州山羊,16只海南黑山羊,16只大足黑山羊,15只阿拉善羊,10只济宁灰山羊)和1种外国本地品种(10只波尔山羊)。16只海南黑山羊的数据来源于我们以前的测序结果,并上传到GenBank(登记号PRJNA754269)[21]。其余71只山羊样本的整个基因组数据来自于从国家生物技术信息中心(NCBI,https://www.ncbi.nlm.nih.gov/)下载的公开可用数据。
为了获得用于芯片设计的高质量SNP,进行了SNP调用,具体步骤如下。使用fastp (v0.20.0) [22]对测序数据进行过滤,然后通过Burrows-Wheeler Aligner (v0.7.12-r1039) [23]对山羊参考基因组(ARS1)进行比对。使用Picard (v1.107)对sam文件进行排序和转换为bam文件,并去除PCR复制体 [24]。使用GATK [25]进行SNP的检测和过滤。根据基因组数据,过滤的标准如下:(1)Fisher检验的偏向性(FS)≤60。(2)单体型得分≤13.0。(3)比对质量(MQ)≥40。(4)质量深度(QD)≥2。(5)读取位置排名和总和(ReadPosRankSum)≥-8.0。(6)MQRankSum>-12.5。
5.3 种群遗传变异分析
为了构建海南黑山羊和非海南黑山羊的参考种群,分别对87只山羊种群样本和所有样本进行了种群遗传和系统发育分析。首先,使用PLINK [26]根据以下标准过滤SNP:(1)去除包含缺失数据点>10%的SNP。(2)去除频率(MAF)值<0.05的SNP。将经过筛选的SNP位点转化为线性序列信息后,使用MEGA-X [27](Kimura 2参数模式和bootstrap重复1000次)构建邻接进化树。最后,使用Rstudio(v4.0.5)美化系统发育树。
5.4 从全基因组重测序中选择候选SNP位点
遗传分化指数(Fst)是一种衡量种群分化和遗传距离的方法,适用于亚种群间的多样性比较。分化指数越大,差异越大[28]。使用VCFtools (v0.1.13) [29]计算了海南黑山羊和非海南黑山羊参考种群之间每个SNP变异位点的Fst值。筛选了Fst>0.5的SNP位点。具有较高Fst值的位点可以用于区分海南黑山羊和非海南黑山羊种群的基因型。
高多态位点的筛选过程描述如下。使用VCFtools-0.1.13将重测序SNP调用的原始VCF文件转换为plink格式文件。接着,使用PLINK筛选符合以下条件的位点:(1)MAF≥0.05。(2)缺失率≤0.1。(3)平均深度(AV_ Deep)≥2。然后,使用PLINK中的硬度参数计算SNP位点的杂合率。筛选了观测杂合率(O HET)≤0.3的高多态候选SNP位点。高多态位点可用于不同山羊种群的基因型分析。
5.5 从GGVD中选择候选SNP位点
山羊基因组变异数据库(GGVD,http://animal.nwsuaf.edu.cn/GoatVar)致力于现代和古代山羊基因组的变异、选择特征和引入区域[30]。它包含丰富的山羊基因组变异信息。山羊包括野山羊、非洲山羊、非洲奶山羊、欧洲山羊、澳大利亚山羊、西南亚山羊、南亚山羊、东亚山羊、藏山羊、托根堡山羊、桑嫩山羊、隆林山羊、雷州山羊、喀什米尔山羊、比塔尔山羊和野山羊与家山羊。从GGVD下载了山羊变异数据ARS1_SNPs.anno.tab.tar.gz(1.6G),筛选了MAF值高于0.05的SNP位点,共筛选出了2514个候选SNP。此外,海南黑山羊具有良好的抗病能力。因此,在GGVD中特别搜索了免疫基因的SNP位点,包括IL6、TNF、IL1B、IL10和IFNG等34个感兴趣的基因。所有免疫基因上都有许多SNP位点。但是GGVD没有海南黑山羊的变异数据。由于雷州山羊和海南黑山羊之间的密切遗传关系[22],因此在选择免疫基因的候选SNP位点时,我们专注于雷州山羊中的SNP位点。
5.6 从文献来源选择与重要性状相关的候选SNP位点
在PubMed(https://pubmed.ncbi.nlm.nih.gov/,截至2021年7月21日之前)和中国知网(https://www.cnki.net/,截至2021年7月21日之前)上搜索文献。在这两个网站上搜索山羊SNP和绵羊SNP。通过浏览文献的标题和摘要确定与山羊或绵羊性状相关的SNP。在仔细阅读文献后,记录了参考基因组中的SNP位点位置信息。然后在NCBI(https://www.ncbi.nlm.nih.gov/,截至2021年7月21日之前)或Ensembl(http://www.ensembl.org/,截至2021年7月21日之前)上搜索包含SNP位点50 bp上游和50 bp下游的101 bp碱基序列。然后将该序列与最新的ARS1进行比对。最终获得了大量与性状相关的SNP位点信息,包括SNP位点的周围序列、SNP位点的基因组位置、SNP的参考和突变等。华智生物科技有限公司验证了SNP位点及其周围序列是否能准确对应到ARS1,并最终确定了与芯片开发相关的重要性状的候选SNP位点。
5.7 SNP位点筛选原则和cGPS液体芯片设计方法
为满足芯片开发的要求,对收集的SNP位点进行了筛选。首先,去除了重复的SNP位点。优先选择来自重测序数据的候选SNP位点,具体如下所述。 (1)MAF>0.1。 (2)缺失率<0.1。 (3)AV_Deep≥2。 (4)杂合率≤0.3。 (5)SNP是唯一的变异类型。其次,填充了来自GGVD的高多态候选SNP位点,并均匀分布在来自重测序数据的筛选SNP位点中。最后,添加了来自文献的候选SNP位点和感兴趣的免疫基因上的SNP位点。
探针设计在SNP位点的周围序列内进行,该序列包含SNP位点的上游100 bp和下游100 bp(总共201 bp)。此外,设计的探针必须符合以下标准。 (1)探针长度通常为100 bp(图8A)。 (2)探针的GC含量为20%-80%。 (3)单拷贝。 (4)探针覆盖区域中的SNP数量较少。 (5)探针形成的二聚体和发卡结构处于合理范围内。 (6)SNP位点在染色体上均匀分布。由于某些10677个SNP位点之间的距离较短,距离不超过100 bp的位点可以共享一个探针。例如,两个SNP位点chr1:34235967和chr1:34236021之间的距离为54 bp,共享一个探针,如下所示(图8B)。
捕获间隔序列主要由第二代测序进行分析。为了完成SNP调用和基因型分析,需要将获得的读数与给定区域进行比对。具体算法如下:以每个SNP位点为中心,向上游和下游扩展100-200 bp作为捕获间隔。例如,chr1:316747的对应捕获间隔为chr1:316597-316897。对于两个相邻的SNP位点,捕获间隔的重叠区域可以作为新的间隔。例如,chr19:19216460和chr19:19216471共享一个捕获间隔,即chr19:19216320-19216631。
5.8 cGPS液体芯片的原则和应用流程
液体捕获靶标点的定位测序(cGPS)是华智生物科技有限公司独立开发的高密度(5K-100K目标间隔)靶向测序基因型技术。基于优化的热力学稳定性算法模型,为基因组的不同目标区域设计了特定的探针。合成的探针用于通过液相杂交捕获和富集位于基因组不同位置的多个目标序列。在构建文库和高通量测序后,获得了目标区域中所有SNP/InDel位点的基因型。
在开发cGPS液体芯片后,可以测试大量样本。主要步骤包括DNA提取、文库构建和质量控制、液相杂交和目标间隔富集、第二代测序和生物信息学分析。最终完成了对被测试样本的目标间隔变异分析[18, 31]。
5.9 液体芯片的验证
为了测试10K cGPS液体芯片对海南黑山羊的可用性,我们收集了一些中国南方山羊品种和一个与海南黑山羊相似的绵羊品种的DNA样本。经过质量控制后,共使用104只山羊的DNA样本进行芯片验证。为了获得目标基因组区域的基因型数据,华智生物科技有限公司进行了标准化的操作流程,包括文库构建、目标区域捕获、Illumina测序和生物信息学分析。
首先,为了测试芯片的每个特定探针能否定位到其目标区间并准确检测SNP位点,验证了芯片的SNP位点检测率。通过对芯片测序数据进行生物信息学分析,可以获得cGPS液体芯片上所有样本的总SNP位点和多态位点数、检测率、缺失率和杂合率。
在第二步中,评估了芯片的基因型准确性,包括一致性和重复性验证。需要比较cGPS液体芯片的SNP位点的基因型结果与重测序的结果[32]。因此,我们选择了15只海南黑山羊样本进行cGPS液体芯片检测,并获得了来自芯片的SNP位点的基因型结果。此外,使用以前获得的重测序数据来获得来自重测序的SNP位点的基因型结果。然后评估了每个个体的两个结果(分别来自cGPS液体芯片和重测序

chenglonglcl1-9PcS_

In the second step, genotyping accuracy of the chip was evaluated, which contained consistency and repeatability verification. It is necessary to compare the genotyping results of SNP sites from the cGPS liquid chip with those from resequencing [32]. Therefore, we selected 15 samples of the Hainan black goat for cGPS liquid chip detection and acquired genotyping results of SNP sites from the chip. In addition, the previously obtained resequencing data was used to acquire genotyping results of SNP sites from resequencing. Then, the consistency of the two results (from cGPS liquid chip and resequencing, respectively) in each individual was evaluated. At the same time, four DNA samples were randomly selected, all of which was set up in duplicate. The results of two independently repeated detection in each sample was compared to verify the repeatability of the chip. If a locus was missing (NA) in one of the two results, then it would not be used for consistency or repeatability verification. A total of 108 detection results were obtained in this step, including 104 samples for consistency verification and 4 samples for repeatability verification. Among the detection data, MAF, Fst and other indicators of each locus were mainly selected for the above analysis.
Subsequently, the clustering ability of the chip was verified. We mainly focused on whether the chip can distinguish Hainan black goat from other breeds. DNA samples from 104 goats with definite breed were used for genotyping by the cGPS liquid chip. For the genotyping data of the samples, we deleted SNPs with call rate <90% and MAF <0.05 to ensure that the analyzed SNPs were in Hardy-Weinberg equilibrium (HWE) (p<10-6) [33]. By using the filtered genotyping data, we applied MEGA-X for cluster analysis and iTOL (v4) for drawing the phylogenetic tree [34]. Besides, Python (v2) and smartpca were used to obtain the eigenvectors and eigenvalues. Finally, Rstudio (v4.0.5) was used to depict the principal component diagram.

Avatar
Avatar

在第二步中,对芯片的基因型准确性进行了评估,包括一致性和重复性验证。需要比较cGPS液体芯片的SNP位点的基因型结果与重测序的结果[32]。因此,我们选择了15个海南黑山羊样本进行cGPS液体芯片检测,并获得了来自芯片的SNP位点的基因型结果。此外,使用以前获得的重测序数据来获得来自重测序的SNP位点的基因型结果。然后评估了每个个体的两个结果(分别来自cGPS液体芯片和重测序)的一致性。同时,随机选择了四个DNA样本,每个样本都设置了重复。比较了每个样本中两次独立重复检测的结果,以验证芯片的重复性。如果其中一个结果中缺少一个位点(NA),则不用于一致性或重复性验证。在这一步中共获得了108个检测结果,包括104个样本的一致性验证和4个样本的重复性验证。在检测数据中,主要选择了每个位点的MAF、Fst和其他指标进行上述分析。

随后,验证了芯片的聚类能力。我们主要关注的是芯片是否能够区分海南黑山羊和其他品种。使用104只已确定品种的山羊的DNA样本进行了cGPS液体芯片的基因型分析。对样本的基因型数据,我们删除了呼叫率<90%和MAF<0.05的SNP,以确保分析的SNP符合Hardy-Weinberg平衡(HWE)(p<10-6)[33]。通过使用筛选后的基因型数据,我们应用MEGA-X进行聚类分析,使用iTOL(v4)绘制系统发育树[34]。此外,使用Python(v2)和smartpca获得特征向量和特征值。最后,使用Rstudio(v4.0.5)绘制主成分图。

chenglonglcl1-9PcS_

2 Result
2.1 Results of whole-genome resequencing data analysis in goats
A total of 87 goat genomic data was obtained from the sequencing of Hainan black goats and public databases of other goat breeds. After filtering, the clean reads were aligned with the reference genome. And the average alignment rate, coverage rate and sequencing depth of each sample were analyzed. The results showed that the resequencing data was of good quality and could be used for further analysis. In addition, 88,454,696 SNPs were identified in 87 goats. (Additional file 1: Table S1)
2.2 Candidate SNP sites from whole-genome resequencing
After filtering, 1,212,378 SNPs were finally used for the construction of the phylogenetic tree. The results (Fig. 2) showed that the reference population of the Hainan black goat and the non-Hainan black goat were successfully constructed, which included 16 Hainan black goats and 71 other goats, respectively. The phylogenetic tree also showed that Hainan black goats were closely related to Leizhou goats. Next, the Fst values and polymorphisms of all SNP sites in both reference populations were determined. 39,101 candidate SNP sites with high polymorphism in both Hainan black goat and non-Hainan black goat populations were screened from the resequencing data (Additional file 2: Table S2). There were 1,530 candidate SNP sites with Fst >0.5 in both reference populations (Additional file 3: Table S3).
2.3 Candidate SNP sites from GGVD database
A total of 2,514 high polymorphic (MAF>0.05) SNP sites (Additional file 4: Table S4) and 125 immunogene SNP sites (Additional file 5: Table S5) were derived from GGVD. The immunogenes included IL6, TNF, IL1B, IL10, IFNG and other 34 genes of interest.
2.4 SNP candidate sites from literature sources
Literatures on SNPs that associated with important traits, including meat quality, reproduction, growth, production, disease resistance, and immunity in goats and sheep, were searched and browsed in PubMed and CNKI. SNP sites information in more than 270 Chinese and foreign literatures was recorded. Then, the flanking sequences of the SNP sites were searched and aligned with ARS1 to relocate their position on ARS1. After testing, a total of 2,035 candidate SNP sites related to important traits were eventually determined (Additional file 6: Table S6).
2.5 Design results of 10K cGPS liquid chip for Hainan black goat
After removing the repeated sites, the remaing SNP sites and its flanking sequences were converted to the information (including chromosome, physical location, sequence, reference genome genotype) in accordance with ARS1. Then, this information was used for the design and synthesis of the probe. Among the 45,588 candidate SNP sites, 10,677 qualified sites were screened according to the screening requirements and probe design results (Additional file 7: Table S7). The distribution map (Fig. 3A) of 10,677 sites in the reference genome showed that the sites on the chip were basically evenly distributed in autosomes. The sources of SNP sites on the 10K cGPS liquid chip were showed in Fig. 3B. Among them, the sites from requencing data accounted for approximately 70%, including 6,629 high polymorphic sites and 803 sites with Fst value greater than 0.5 in the reference population of Hainan black goat and non-Hainan black goat. The percent of sites form GGVD was about 11%, which contained 1,136 high polymorphic sites and 75 sites in immunogenes of interest. Besides, there were 2,034 sites related to important traits from the literature, accounting for about 19%. Eventually, 7,765 high polymorphic SNP sites were selected in the final panel, which was about 72.7% of the total panel sites. These high polymorphic sites can be applied to the genotype analysis of different populations. Among the total screened sites, 9,100 (about 85.2%) were found in GGVD.
Due to the short distance between some of the 10,677 SNP sites, the sites with a distance of no more than 100 bp can share a probe. In order to form the system of 10K cGPS liquid chip for Hainan black goat, a total of 10,571 probes were designed, which can capture 9,993 intervals. The annotation results of chip site (Fig. 4A) showed that most SNPs (49.02%) were between genes, 31.48% were in intron regions, and only 19.5% were located in other regions (Additional file 8: Table S8). Further, genes annotated with SNPs affected by moderate or high mutations were selected. Using David database [35] to find the data of gene enriched pathways, the results showed that GO and KEGG enriched terms were mainly immune related (Fig. 4B and Fig. 4C). This liquid chip is beneficial for searching potential immune related SNPs in Hainan black goat.
Firstly, the site detection rate of the 10K cGPS liquid chip for Hainan black goat was verified. A total of 104 goat genomic DNA samples were tested. The call rate was 97.34% -99.93%. And 84.5% of the SNP sites were polymorphic. The heterozygosity rate was 3.08%-36.80%. It can be seen that the site detection rate of the chip was very high and met the requirements (Additional file 9: Table S9).
In order to verify the consistency of the genotyping results of the chip, we used 15 resequenced DNA sample for genotype detection by the chip. Then, the genotyping results from cGPS liquid chip were compared with those from resequencing (Additional file 10: Table S10). The consistency rate was between 81.97% and 89.16%. And the average consistency rate was 85.58% (Fig. 5B). The average depth of samples in resequencing was low, which was 4.77. While the average depth of samples in cGPS liquid chip was 177.90 (Fig. 5A). The proportion of resequencing sites with depth of more than 10X was only 8.19%. In comparison, the proportion of cGPS sites with depth of more than 10X was 99.36% (Fig. 5B). Accordingly, there were some errors in the determination of genotypes by different sequencing depth.
In order to verify the repeatability of the 10K cGPS liquid chip for Hainan black goat, we selected four samples of GZHSY-10, sheet23, sheet30 and sheet9128 to compare the repeated detection results of the same genotype (Additional file 11: Table S11). The comparison of genotyping results in each sample showed that the consistency rate was between 99.66% and 99.82%. The average consistency rate was 99.75%, which showed the good repeatability of the chip (Table 1).
In order to verify the clustering ability of the 10K cGPS liquid chip for Hainan black goat, cluster analysis was performed based on the test results of 104 samples. The results of phylogenetic tree and PCA showed that Hainan black goats had obvious clustering with other goat breeds and there were obvious clusterings among different goats, which basically realized the clustering function (Fig. 6A-6B). Small-tailed Han sheep, a breed of sheep, can also be distinguished by the chip. We also found that the Hainan black goat in different regions of Hainan were not clustered but mixed with each other, which was related to High genetic diversity of Hainan black goats. It was worth noting that a Guizhou black goat and a Hainan black goat were mixed in the marginal clustering area of Yunshang black goat, which was related to the cultivation method of the new breed of Yunshang black goat. After 5 generations of research in 22 years, Yunshang black goat was cultivated by comparing the genes of different goat breeds around the world. The local Yunling black goat was used as the female parent and the Egyptian Nubian black goat was used as the male parent [36]. It is the first new breed of meat black goat in China developed by artificial breeding techniques. Therefore, we speculated that the genome of Yunshang black goat may contain the dominant genotypes of Guizhou black goat and Hainan black goat.
Finally, we summarized the detection results of all 104 samples and 4 repeated detection results, which reached up to 108. The genotyping results of SNP sites detected by the chip were obtained (Additional file 12: Table S12). Among the 108 chip detection results, the MAF, deletion rate, heterozygosity rate, and Fst value of all SNP sites were counted (Additional file 13: Table S13). The distribution map of MAF sites showed that the number of SNP sites was the most between 0.3 and 0.4, while the least between 0.05 and 0.1. The MAF of most SNP sites was higher than 0.01. And the proportion of SNP sites that met the requirements was 92.67%. However, there were 783 SNP sites with MAF value less than 0.01 (Fig. 6C). Therefore, it was necessary to expand the sample size and adjust the SNP sites of the cGPS liquid chip.

Avatar
Avatar

2 结果
2.1 山羊全基因组重测序数据分析结果
共获得了87只山羊的基因组数据,包括海南黑山羊的测序数据和其他山羊品种的公共数据库数据。经过过滤,干净的读数与参考基因组进行了比对。对每个样本的平均比对率、覆盖率和测序深度进行了分析。结果显示,重测序数据质量良好,可以用于进一步分析。此外,在87只山羊中鉴定了88454696个SNP(附加文件1:表S1)。

2.2 来自全基因组重测序的候选SNP位点
经过过滤,最终使用1212378个SNP用于构建系统发育树。结果(图2)显示,成功构建了海南黑山羊和非海南黑山羊的参考种群,分别包括16只海南黑山羊和71只其他山羊。系统发育树还显示,海南黑山羊与雷州山羊关系密切。接下来,确定了两个参考种群中所有SNP位点的Fst值和多态性。从重测序数据中筛选出了39101个在海南黑山羊和非海南黑山羊种群中都具有高多态性的候选SNP位点(附加文件2:表S2)。在两个参考种群中,筛选出了1530个Fst>0.5的候选SNP位点(附加文件3:表S3)。

2.3 来自GGVD数据库的候选SNP位点
共筛选出了2514个高多态(MAF>0.05)的SNP位点(附加文件4:表S4)和125个免疫基因SNP位点(附加文件5:表S5)。这些免疫基因包括IL6、TNF、IL1B、IL10、IFNG等34个感兴趣的基因。

2.4 来自文献来源的SNP候选位点
在PubMed和中国知网上搜索了与山羊和绵羊重要性状相关的SNP的文献,包括肉质、繁殖、生长、产量、抗病性和免疫性。记录了在270多篇中文和外文文献中的SNP位点信息。然后,搜索了SNP位点的周围序列,并将其重新定位到ARS1上。经过测试,最终确定了2035个与重要性状相关的候选SNP位点(附加文件6:表S6)。

2.5 海南黑山羊10K cGPS液体芯片的设计结果
去除重复位点后,剩余的SNP位点及其周围序列根据ARS1转换为信息(包括染色体、物理位置、序列、参考基因组基因型)。然后,这些信息用于探针的设计和合成。在45588个候选SNP位点中,根据筛选要求和探针设计结果,筛选出10677个合格位点(附加文件7:表S7)。在参考基因组中,10677个位点的分布图(图3A)显示,芯片上的位点基本均匀分布在常染色体上。10K cGPS液体芯片上的SNP位点来源如图3B所示。其中,来自重测序数据的位点约占70%,包括6629个高多态位点和803个Fst值大于0.5的位点,涵盖海南黑山羊和非海南黑山羊的参考种群。GGVD数据中位点约占11%,包括1136个高多态位点和75个感兴趣的免疫基因位点。此外,来自文献的与重要性状相关的2034个位点约占总位点的19%。最终,在最终的面板中筛选出7765个高多态SNP位点,约占总位点的72.7%。这些高多态位点可以应用于不同种群的基因型分析。在筛选的总位点中,9100个(约占85.2%)位点在GGVD中找到。

由于10,677个SNP位点中有些位点之间的距离较短,距离不超过100 bp的位点可以共享一个探针。为了形成海南黑山羊10K cGPS液体芯片的系统,设计了10571个探针,可以捕获9993个间隔。芯片位点的注释结果(图4A)显示,大部分SNP位点(49.02%)位于基因之间,31.48%位于内含子区域,只有19.5%位于其他区域(附加文件8:表S8)。此外,选择了受中度或高度突变影响的基因。使用David数据库[35]寻找富集通路的基因数据,结果显示GO和KEGG富集术语主要与免疫有关(图4B和图4C)。这种液体芯片有助于在海南黑山羊中搜索潜在的免疫相关SNP。

首先,验证了海南黑山羊10K cGPS液体芯片的位点检测率。共测试了104只山羊的基因组DNA样本。呼叫率为97.34%-99.93%。84.5%的SNP位点具有多态性。杂合率为3.08%-36.80%。可以看出,芯片的位点检测率非常高,符合要求(附加文件9:表S9)。

为了验证芯片的基因型结果的一致性,我们使用了15个重测序DNA样本进行芯片基因型检测。然后,将cGPS液体芯片的基因型结果与重测序结果进行了比较(附加文件10:表S10)。一致性率在81.97%和89.16%之间。平均一致性率为85.58%(图5B)。重测序样本的平均深度较低,为4.77。而cGPS液体芯片样本的平均深度为177.90(图5A)。重测序位点深度大于10X的比例仅为8.19%。相比之下,cGPS位点深度大于10X的比例为99.36%(图5B)。因此,不同测序深度会对基因型的确定产生一定的误差。

为了验证海南黑山羊10K cGPS液体芯片的重复性,我们选择了GZHSY-10、sheet23、sheet30和sheet9128四个样本,比较了同一基因型的重复检测结果(附加文件11:表S11)。每个样本中基因型结果的比较显示,一致性率在99.66%和99.82%之间。平均一致性率为99.75%,表明芯片具有良好的重复性(表1)。

为了验证海南黑山羊10K cGPS液体芯片的聚类能力,基于104个样本的测试结果进行了聚类分析。系统发育树和PCA的结果显示,海南黑山羊与其他山羊品种有明显的聚类,不同山羊之间也有明显的聚类,基本实现了聚类功能(图6A-6B)。羊种中的小尾寒羊也可以通过芯片进行区分。我们还发现,海南不同地区的黑山羊并没有形成聚类,而是相互混合,这与海南黑山羊的高遗传多样性有关。值得注意的是,一只贵州黑山羊和一只海南黑山羊混在云上黑山羊的边缘聚类区域,这与新品种云上黑山羊的培育方法有关。经过22年的5代研究,云上黑山羊是通过比较世界各地不同山羊品种的基因培育出来的。当地的云陵黑山羊作为母本,埃及努比亚黑山羊作为公本[36],这是中国第一个通过人工育种技术培育的肉用黑山羊新品种。因此,我们推测云上黑山羊的基因组可能含有贵州黑山羊和海南黑山羊的优势基因型。

最后,总结了所有104个样本的检测结果和4个重复检测结果,共计108个。获得了芯片检测的SNP位点的基因型结果(附加文件12:表S12)。在108个芯片检测结果中,统计了所有SNP位点的MAF、缺失率、杂合率和Fst值(附加文件13:表S13)。MAF位点的分布图显示,SNP位点的数量在0.3和0.4之间最多,在0.05和0.1之间最少。大多数SNP位点的MAF大于0.01。符合要求的SNP位点比例为92.67%。然而,有783个MAF值小于0.01的SNP位点(图6C)。因此,有必要扩大样本量并调整cGPS液体芯片的SNP位点。

chenglonglcl1-9PcS_

3 Discussion
Single nucleotide polymorphism (SNP) is widely used in genetic research and molecular breeding [37]. The selected SNP sites of the 10K cGPS liquid chip for Hainan black goat is divided into three parts. The first part is 40,631 SNPs from the whole genome resequencing of 7 representative goat breeds. Advances in whole genome sequencing technology help to discover SNPs [38]. We selected goat breeds from different regions of China, including Hainan black goats, as well as an abroad goat breed. The SNP sites from whole genome resequencing included 39,101 SNPs with high polymorphism in all goats and 1,530 SNPs with Fst >0.5 in the populations of Hainan black goat and non-Hainan black goat. The 10K cGPS liquid chip is a customized SNP chip designed for Hainan black goat. The sites with high Fst value can be used to distinguish the genotypes of Hainan black goat and non-Hainan black goat. Huanhuan Fan et al. measured the Fst values and the heterozygosity of all SNP sites in the reference populations of sika deer and red deer. And 1,000 SNP sites with high Fst values were screened to form a 1K sika deer SNP chip [32]. High polymorphism sites can be applied to analyze the genotype of different goat populations. When developing Eucalyptus EUChip60K chip, Orzenil B Silva-Junior et al. retained polymorphic SNPs between and within species, including those fixed in one specie but polymorphic relative to another species [39].
The second part is 2,639 SNPs from GGVD. Among the comprehensive databases that contain goat SNP information, dbSNP[40] and EVA[41] establish a compatible global system to assign unique identifiers for all submitted genetic variations and share the variation data of multiple species. However, dbSNP now only updates human variation information. In contrast, GGVD is more easier to use. The allele frequency data in GGVD will provide convenience for population genetic research and molecular marker design in goat breeding projects [30]. Besides, Animal-ImputeDB (http://gong_lab.hzau.edu.cn/Animal_ImputeDB/) is also a good choice. It provides not only genetic variation information of 13 animal species, but also online genotype interpolation, which will greatly promote animal genome selection and genetic improvement research [42]. Genetic variation of immunogene may play an important role in the susceptibility of a series of common diseases with inflammatory reaction [43]. Therefore, we selected the SNP sites of immunogenes. It has been reported that the SNP of TNF-α affects the reproductive performance and immune function of dairy cows [44]. TLR2 plays an important role in the recognition of Gram-positive bacteria by innate immune system. The polymorphism of TLR2 in goats may be related to the elevated somatic cell count in milk caused by mastitis [45]. Due to the strong disease resistance of Hainan black goat, we specifically searched for the SNP sites of some immunogenes in GGVD, which were of interest in our previous study. These sites were helpful to the subsequent mining of disease resistance genes of Hainan black goat.
The third part is 2,367 SNPs from the literature. Goats and sheep can be considered to have a common evolutionary origin [46]. We searched SNP sites that associated with important traits such as meat quality, reproduction, growth, production, disease resistance and immunity in goats and sheep from the literature. This may improve the results of genomic selective breeding. Genome-wide association study (GWAS) is a key technology to study the genetic basis of complex traits and diseases through genotype-phenotype association [47]. Ranran Liu et al. [2] developed a 55K genotyping array and selected SNPs related to economic traits from the literature, which can be potentially applied to GWAS for traits of interest. Based on the genome sequencing data of cashmere goats, Xian Qiao et al. [15] added 858 SNPs of some genes that related to wool traits and designed a 66K SHS-based target enrichment SNP chip for cashmere goats. It was successfully used for association analysis of cashmere fiber traits. Another method that can quickly find trait-associated SNPs is to search in the publicly available databases containing SNP and GWAS. It is known that GWAS Atlas is a manually collated resource of genome-wide variant trait associations for various species, involving cultivated plants and livestock (including goats) [48]. The continuous development and improvement of the AnimalQTLdb[49] also allows users to easily obtain QTL and SNP-gene association data on livestock species. Online databases can quickly find SNP sites associated with traits. However, we believe that the content of these databases is also based on published literature, which may be not comprehensive and requires regular updates over time. Although it is time-consuming and cumbersome to find SNP sites associated with traits by searching the literature, we can track newly discovered SNP sites associated with important traits.
Compared with the traditional single-locus genotyping method, cGPS is a kind of targeted sequencing genotyping technology. It uses capture probes to select DNA regions of great interest for high-depth sequencing analysis. Target-enriched SNP genotyping is a method with low cost and high efficiency. Targeted sequencing can not only obtain large-scale SNPs of different densities, but also provide more information on SNP variation, InDel and copy number variation [15]. This strategy of genotyping by targeted sequencing has many different names duo to the different methods of targeted enrichment and sequencing, such as SHS [31], GBTS [50], Target SNP-seq [51], MRASeq [52], etc. Among them, cGPS is a targeted sequencing genotyping technology of high and medium density (5K-100K target interval) that independently developed by Huazhi Biotechnology Co., Ltd., China.
To form the 10K cGPS liquid chip for Hainan black goat, we removed the repeated SNP sites and screened 10,677 qualified SNP sites from all 45,588 candidate sites. In general, the physical or genetic distance between markers and allele frequency are the main selection factors [53, 16]. According to the requirements of different chips, high-impact or rare variations, as well as variations of important traits, can be given priority. In this study, we also considered similar selection factors as described above. Finally, the sites on the 10K cGPS liquid chip were basically evenly distributed in autosomes. And only one in chromosome 6 had a high density distribution, which was a normal phenomenon. The sources of SNP sites on the 10K cGPS liquid chip are in line with our selection objective. SNP sites were mainly from resequencing data, followed by literature, and the least from GGVD. The annotation results of SNP sites on the chip showed that they were mainly located in the intergenic region and intron region. This was because that these SNP markers were designed to cover the entire genome. Meanwhile, it was also consistent with the annotation results of SNPs in our whole gene resequencing data. Most of the mSNPs (74.3%) in the 40K maize mSNP panel developed by Zifeng Guo et al. were intergenic, 15.3% were in introns, and 6.2% were from other regions. We annotated the SNP sites on the chip and further associated the SNP sites with phenotypes. In the future, it can play an important role in gene mapping, GWAS, and molecular marker-assisted breeding of goat.
We verified the SNP sites detection rate, the consistency and repeatability of the genotyping results of the 10K cGPS liquid chip. The detection rate was between 97.34% and 99.93%. The repetition rate was between 99.66% and 99.82%. And the consistency rate between cGPS liquid chip genotyping results and resequencing genotyping results ranged from 81.97% to 89.16%. The detection rate and repeatability of the chip were good, but the consistency rate of genotyping results was relatively low. We considered that different sequencing depths caused certain errors in the determination of genotyping results. Interestingly, one article has similar results with us. The verification results of the 200K SNP array developed by Kang Wei et al. [54] showed that the average detection rate was 98.1%. The SNP repeatability of the repeated samples were 99.71% and 99.67%, respectively. The consistency rate of SNP genotyping between SNP array and resequencing data was 64.14%-91.93%, with an average of 84.07%. In order to further verify the accuracy of the array, they randomly selected inconsistent SNPs and performed sanger sequencing. The results showed that neither resequencing nor SNP array could guarantee 100% correct results [54]. Therefore, the subsequent mutual verification by different methods is very important.
If it is difficult to distinguish different breeds by phenotype, we should identify them at the molecular level. A southern Chinese goat breed with similar phenotype to Hainan black goat and a sheep breed were selected to verify the clustering ability of the chip. The results showed that 84.5% of the SNP sites were polymorphic and the heterozygosity rate was between 3.08% and 36.80%. It indicated that the 10K cGPS liquid chip can be used to determine the genetic variation of goat breeds in southern China. The chicken 55K SNP genotyping array developed by Ranran Liu et al. showed 76.7%-88.0% SNPs were polymorphic in population verification [4]. The results of phylogenetic tree and PCA analysis showed that Hainan black goat, Yunshan black goat, Guizhou black goat and Small-tailed Han sheep were clustered to different positions, which basically realized the distinguishing function. The phylogenetic tree also showed that Hainan black goat in different regions of Hainan were not clustered but mixed together. The PCA results showed that the Hainan black goat populations were more dispersed, which was consistent with the phylogenetic tree results. This was because Hainan black goats in different regions of Hainan had not been breeded well by local farmers. In addition, the SNP genotyping data of the chip can also help to identify the pure Hainan black goat lineage, scientifically guide the hybridization and improvement of Hainan black goat, and contribute to the protection and development of goat germplasm resources. Hainan black goat cGPS chip is the first chip developed for tropical goat germplasm resources in China. Tropical goats closely related to Hainan black goat can also benefit from the chip.
Based on the results of the 10K cGPS chip for Hainan black goat, we counted the MAF of all SNP sites and analyzed the potential causes of a few low allele frequency variants. Variants with low allele frequency contain less information [16]. Among the total 10,677 SNP sites, a small number were found to have low allele frequency or even no polymorphism. For these SNP sites, we considered that there may be certain errors in probe design and the trait-related SNP sites found in the literature. Most of the samples we verified were Hainan black goats. Perhaps these variants did not exist in the samples we selected, but in other goat breeds. The design of 10K cGPS liquid chip is flexible. Therefore, if the SNP sites of the chip needs to be modified, more samples are needed to verify the feasibility of the adjustment. In this way, the liquid chip can be more suitable for the study of Hainan black goat and conducive to the conservation of Hainan black goat germplasm resources.
4 Conclusions
In general, we realized the development and verification of the 10K cGPS liquid chip for Hainan black goat. For the design of the chip, goat resequencing data, GGVD and literatures were used to obtain candidate sites. A total of 10,677 representative SNP sites were selected to design probes, which covered 9,993 intervals and formed the 10K cGPS liquid chip. For the verification of the chip, the results showed that the detection rate of the sites was 97.34%-99.93%. Polymorphic SNP sites accounted for 84.5%. The heterozygosity rate was 3.08% -36.80%. The sequencing depth of more than 99.4% of sites were over 10X. And the repetition rate was 99.66%-99.82%. Due to the low depth of resequencing sites, the average consistency rate between chip genotyping results and the resequencing results was 85.58%, indicating that the genotyping results of the 10K cGPS liquid chip were more reliable. In addition, phylogenetic analysis proved that the chip had good clustering ability. The chip can accurately evaluate the genetic diversity of goats and provide a material basis for goat disease resistance breeding. Moreover, it can realize the breed identification and genetic relationship analysis of Hainan black goat, which lays a solid foundation for its subsequent breeding research.

Avatar
Avatar

A server error has occurred

INTERNAL_SERVER_ERROR

chenglonglcl1-9PcS_

3 Discussion
Single nucleotide polymorphism (SNP) is widely used in genetic research and molecular breeding [37]. The selected SNP sites of the 10K cGPS liquid chip for Hainan black goat is divided into three parts. The first part is 40,631 SNPs from the whole genome resequencing of 7 representative goat breeds. Advances in whole genome sequencing technology help to discover SNPs [38]. We selected goat breeds from different regions of China, including Hainan black goats, as well as an abroad goat breed. The SNP sites from whole genome resequencing included 39,101 SNPs with high polymorphism in all goats and 1,530 SNPs with Fst >0.5 in the populations of Hainan black goat and non-Hainan black goat. The 10K cGPS liquid chip is a customized SNP chip designed for Hainan black goat. The sites with high Fst value can be used to distinguish the genotypes of Hainan black goat and non-Hainan black goat. Huanhuan Fan et al. measured the Fst values and the heterozygosity of all SNP sites in the reference populations of sika deer and red deer. And 1,000 SNP sites with high Fst values were screened to form a 1K sika deer SNP chip [32]. High polymorphism sites can be applied to analyze the genotype of different goat populations. When developing Eucalyptus EUChip60K chip, Orzenil B Silva-Junior et al. retained polymorphic SNPs between and within species, including those fixed in one specie but polymorphic relative to another species [39].
The second part is 2,639 SNPs from GGVD. Among the comprehensive databases that contain goat SNP information, dbSNP[40] and EVA[41] establish a compatible global system to assign unique identifiers for all submitted genetic variations and share the variation data of multiple species. However, dbSNP now only updates human variation information. In contrast, GGVD is more easier to use. The allele frequency data in GGVD will provide convenience for population genetic research and molecular marker design in goat breeding projects [30]. Besides, Animal-ImputeDB (http://gong_lab.hzau.edu.cn/Animal_ImputeDB/) is also a good choice. It provides not only genetic variation information of 13 animal species, but also online genotype interpolation, which will greatly promote animal genome selection and genetic improvement research [42]. Genetic variation of immunogene may play an important role in the susceptibility of a series of common diseases with inflammatory reaction [43]. Therefore, we selected the SNP sites of immunogenes. It has been reported that the SNP of TNF-α affects the reproductive performance and immune function of dairy cows [44]. TLR2 plays an important role in the recognition of Gram-positive bacteria by innate immune system. The polymorphism of TLR2 in goats may be related to the elevated somatic cell count in milk caused by mastitis [45]. Due to the strong disease resistance of Hainan black goat, we specifically searched for the SNP sites of some immunogenes in GGVD, which were of interest in our previous study. These sites were helpful to the subsequent mining of disease resistance genes of Hainan black goat.
The third part is 2,367 SNPs from the literature. Goats and sheep can be considered to have a common evolutionary origin [46]. We searched SNP sites that associated with important traits such as meat quality, reproduction, growth, production, disease resistance and immunity in goats and sheep from the literature. This may improve the results of genomic selective breeding. Genome-wide association study (GWAS) is a key technology to study the genetic basis of complex traits and diseases through genotype-phenotype association [47]. Ranran Liu et al. [2] developed a 55K genotyping array and selected SNPs related to economic traits from the literature, which can be potentially applied to GWAS for traits of interest. Based on the genome sequencing data of cashmere goats, Xian Qiao et al. [15] added 858 SNPs of some genes that related to wool traits and designed a 66K SHS-based target enrichment SNP chip for cashmere goats. It was successfully used for association analysis of cashmere fiber traits. Another method that can quickly find trait-associated SNPs is to search in the publicly available databases containing SNP and GWAS. It is known that GWAS Atlas is a manually collated resource of genome-wide variant trait associations for various species, involving cultivated plants and livestock (including goats) [48]. The continuous development and improvement of the AnimalQTLdb[49] also allows users to easily obtain QTL and SNP-gene association data on livestock species. Online databases can quickly find SNP sites associated with traits. However, we believe that the content of these databases is also based on published literature, which may be not comprehensive and requires regular updates over time. Although it is time-consuming and cumbersome to find SNP sites associated with traits by searching the literature, we can track newly discovered SNP sites associated with important traits.
Compared with the traditional single-locus genotyping method, cGPS is a kind of targeted sequencing genotyping technology. It uses capture probes to select DNA regions of great interest for high-depth sequencing analysis. Target-enriched SNP genotyping is a method with low cost and high efficiency. Targeted sequencing can not only obtain large-scale SNPs of different densities, but also provide more information on SNP variation, InDel and copy number variation [15]. This strategy of genotyping by targeted sequencing has many different names duo to the different methods of targeted enrichment and sequencing, such as SHS [31], GBTS [50], Target SNP-seq [51], MRASeq [52], etc. Among them, cGPS is a targeted sequencing genotyping technology of high and medium density (5K-100K target interval) that independently developed by Huazhi Biotechnology Co., Ltd., China.
To form the 10K cGPS liquid chip for Hainan black goat, we removed the repeated SNP sites and screened 10,677 qualified SNP sites from all 45,588 candidate sites. In general, the physical or genetic distance between markers and allele frequency are the main selection factors [53, 16]. According to the requirements of different chips, high-impact or rare variations, as well as variations of important traits, can be given priority. In this study, we also considered similar selection factors as described above. Finally, the sites on the 10K cGPS liquid chip were basically evenly distributed in autosomes. And only one in chromosome 6 had a high density distribution, which was a normal phenomenon. The sources of SNP sites on the 10K cGPS liquid chip are in line with our selection objective. SNP sites were mainly from resequencing data, followed by literature, and the least from GGVD. The annotation results of SNP sites on the chip showed that they were mainly located in the intergenic region and intron region. This was because that these SNP markers were designed to cover the entire genome. Meanwhile, it was also consistent with the annotation results of SNPs in our whole gene resequencing data. Most of the mSNPs (74.3%) in the 40K maize mSNP panel developed by Zifeng Guo et al. were intergenic, 15.3% were in introns, and 6.2% were from other regions. We annotated the SNP sites on the chip and further associated the SNP sites with phenotypes. In the future, it can play an important role in gene mapping, GWAS, and molecular marker-assisted breeding of goat.
We verified the SNP sites detection rate, the consistency and repeatability of the genotyping results of the 10K cGPS liquid chip. The detection rate was between 97.34% and 99.93%. The repetition rate was between 99.66% and 99.82%. And the consistency rate between cGPS liquid chip genotyping results and resequencing genotyping results ranged from 81.97% to 89.16%. The detection rate and repeatability of the chip were good, but the consistency rate of genotyping results was relatively low. We considered that different sequencing depths caused certain errors in the determination of genotyping results. Interestingly, one article has similar results with us. The verification results of the 200K SNP array developed by Kang Wei et al. [54] showed that the average detection rate was 98.1%. The SNP repeatability of the repeated samples were 99.71% and 99.67%, respectively. The consistency rate of SNP genotyping between SNP array and resequencing data was 64.14%-91.93%, with an average of 84.07%. In order to further verify the accuracy of the array, they randomly selected inconsistent SNPs and performed sanger sequencing. The results showed that neither resequencing nor SNP array could guarantee 100% correct results [54]. Therefore, the subsequent mutual verification by different methods is very important.
If it is difficult to distinguish different breeds by phenotype, we should identify them at the molecular level. A southern Chinese goat breed with similar phenotype to Hainan black goat and a sheep breed were selected to verify the clustering ability of the chip. The results showed that 84.5% of the SNP sites were polymorphic and the heterozygosity rate was between 3.08% and 36.80%. It indicated that the 10K cGPS liquid chip can be used to determine the genetic variation of goat breeds in southern China. The chicken 55K SNP genotyping array developed by Ranran Liu et al. showed 76.7%-88.0% SNPs were polymorphic in population verification [4]. The results of phylogenetic tree and PCA analysis showed that Hainan black goat, Yunshan black goat, Guizhou black goat and Small-tailed Han sheep were clustered to different positions, which basically realized the distinguishing function. The phylogenetic tree also showed that Hainan black goat in different regions of Hainan were not clustered but mixed together. The PCA results showed that the Hainan black goat populations were more dispersed, which was consistent with the phylogenetic tree results. This was because Hainan black goats in different regions of Hainan had not been breeded well by local farmers. In addition, the SNP genotyping data of the chip can also help to identify the pure Hainan black goat lineage, scientifically guide the hybridization and improvement of Hainan black goat, and contribute to the protection and development of goat germplasm resources. Hainan black goat cGPS chip is the first chip developed for tropical goat germplasm resources in China. Tropical goats closely related to Hainan black goat can also benefit from the chip.
Based on the results of the 10K cGPS chip for Hainan black goat, we counted the MAF of all SNP sites and analyzed the potential causes of a few low allele frequency variants. Variants with low allele frequency contain less information [16]. Among the total 10,677 SNP sites, a small number were found to have low allele frequency or even no polymorphism. For these SNP sites, we considered that there may be certain errors in probe design and the trait-related SNP sites found in the literature. Most of the samples we verified were Hainan black goats. Perhaps these variants did not exist in the samples we selected, but in other goat breeds. The design of 10K cGPS liquid chip is flexible. Therefore, if the SNP sites of the chip needs to be modified, more samples are needed to verify the feasibility of the adjustment. In this way, the liquid chip can be more suitable for the study of Hainan black goat and conducive to the conservation of Hainan black goat germplasm resources.
4 Conclusions
In general, we realized the development and verification of the 10K cGPS liquid chip for Hainan black goat. For the design of the chip, goat resequencing data, GGVD and literatures were used to obtain candidate sites. A total of 10,677 representative SNP sites were selected to design probes, which covered 9,993 intervals and formed the 10K cGPS liquid chip. For the verification of the chip, the results showed that the detection rate of the sites was 97.34%-99.93%. Polymorphic SNP sites accounted for 84.5%. The heterozygosity rate was 3.08% -36.80%. The sequencing depth of more than 99.4% of sites were over 10X. And the repetition rate was 99.66%-99.82%. Due to the low depth of resequencing sites, the average consistency rate between chip genotyping results and the resequencing results was 85.58%, indicating that the genotyping results of the 10K cGPS liquid chip were more reliable. In addition, phylogenetic analysis proved that the chip had good clustering ability. The chip can accurately evaluate the genetic diversity of goats and provide a material basis for goat disease resistance breeding. Moreover, it can realize the breed identification and genetic relationship analysis of Hainan black goat, which lays a solid foundation for its subsequent breeding research.

Avatar