posted on 2021-07-01, 01:37authored byRJ Edwards, MA Field, JM Ferguson, O Dudchenko, J Keilwagen, BD Rosen, GS Johnson, ES Rice, LD Hillier, JM Hammond, SG Towarnicki, A Omer, R Khan, K Skvortsova, O Bogdanovic, RA Zammit, EL Aiden, WC Warren, John Ballard
Background: Basenjis are considered an ancient dog breed of central African origins that still live and hunt with tribesmen in the African Congo. Nicknamed the barkless dog, Basenjis possess unique phylogeny, geographical origins and traits, making their genome structure of great interest. The increasing number of available canid reference genomes allows us to examine the impact the choice of reference genome makes with regard to reference genome quality and breed relatedness. Results: Here, we report two high quality de novo Basenji genome assemblies: a female, China (CanFam_Bas), and a male, Wags. We conduct pairwise comparisons and report structural variations between assembled genomes of three dog breeds: Basenji (CanFam_Bas), Boxer (CanFam3.1) and German Shepherd Dog (GSD) (CanFam_GSD). CanFam_Bas is superior to CanFam3.1 in terms of genome contiguity and comparable overall to the high quality CanFam_GSD assembly. By aligning short read data from 58 representative dog breeds to three reference genomes, we demonstrate how the choice of reference genome significantly impacts both read mapping and variant detection. Conclusions: The growing number of high-quality canid reference genomes means the choice of reference genome is an increasingly critical decision in subsequent canid variant analyses. The basal position of the Basenji makes it suitable for variant analysis for targeted applications of specific dog breeds. However, we believe more comprehensive analyses across the entire family of canids is more suited to a pangenome approach. Collectively this work highlights the importance the choice of reference genome makes in all variation studies.
Funding
This work was supported by the University of New South Wales/School of Biotechnology and Biomolecular Sciences Genomics Initiative and the Basenji Health Endowment Inc., Poynette, WI and. The DNA Zoo initiative funded the Hi-C data collection and analyses. RJE is funded by ARC LP160100610 and ARC LP180100721. MF is funded by NHMRC APP5121190. ELA was supported by an NSF Physics Frontiers Center Award (PHY1427654), the Welch Foundation (Q-1866), a USDA Agriculture and Food Research Initiative Grant (2017-05741), and an NIH Encyclopedia of DNA Elements Mapping Center Award (UM1HG009375).
History
Publication Date
2021-12-01
Journal
BMC Genomics
Volume
22
Issue
1
Article Number
ARTN 188
Pagination
19p.
Publisher
BMC
ISSN
1471-2164
Rights Statement
The Author reserves all moral rights over the deposited text and must be credited if any re-use occurs. Documents deposited in OPAL are the Open Access versions of outputs published elsewhere. Changes resulting from the publishing process may therefore not be reflected in this document. The final published version may be obtained via the publisher’s DOI. Please note that additional copyright and access restrictions may apply to the published version.