Abstract
A meta-analysis of genome-wide association studies (GWAS) identified eight loci that are associated with heart rate variability (HRV) in data from 53,174 individuals. However, functional follow-up experiments - aiming to identify and characterize causal genes in these loci - have not yet been performed. We developed an image- and CRISPR-Cas9-based pipeline to systematically characterize candidate genes for HRV in live zebrafish embryos and larvae. Nine zebrafish orthologues of six human candidate genes were targeted simultaneously in fertilized eggs from fish that transgenically express GFP on smooth muscle cells (Tg(acta2:GFP)), to visualize the beating heart using a fluorescence microscope. An automated analysis of repeated 30s recordings of 381 live zebrafish atria at 2 and 5 days post-fertilization highlighted genes that influence HRV (hcn4 and si:dkey-65j6.2); heart rate (rgs6 and hcn4) and the risk of sinoatrial pauses and arrests (hcn4). Hence, our screen confirmed the role of established genes for heart rate and rhythm (rgs6 and hcn4), and highlighted a novel gene implicated in HRV (si:dkey-65j6.2).
Introduction
Heart rate variability (HRV) reflects the inter-beat variation of heart rate. HRV is controlled by the sinoatrial node, which receives input from the autonomic nervous system. Autonomic imbalance has been associated with work stress and other modifiable or non-modifiable risk factors1, and is reflected in lower HRV. Lower HRV has been associated with higher cardiac morbidity and mortality2, and with a higher risk of all-cause mortality3. HRV can be quantified non-invasively using the RR interval of a 12-lead ECG, making HRV a useful clinical marker for perturbations of the autonomic nervous system. However, the genetic basis of HRV remains largely elusive.
Recently, we and others identified the first loci that are robustly associated with HRV, using a meta-analysis of genome-wide association studies (GWAS) with data from 53,174 participants4. Eleven of the 17 associated single nucleotide polymorphisms (SNPs) overlapped with loci that are also associated with resting heart rate5. The heart rate associated loci in turn had on aggregate been associated with altered cardiac conduction and risk of sick sinus syndrome5. In silico functional annotation of the five loci that are associated with both HRV6 and heart rate5 previously resulted in the prioritization of six candidate genes that are anticipated to be causal6. Functional follow-up experiments - ideally in vivo - are required to conclude if these genes are indeed causal, and examine if they influence HRV, heart rate, and/or cardiac conduction.
Mouse models are commonly used in cardiac research, but mice show substantial differences in cardiac rate and electrophysiology when compared with humans7. Inherently, these differences complicate extrapolation of results from mouse models to humans7. Additionally, rodents are not suitable for high-throughput, in vivo characterization of cardiac rhythm and rate. Such screens are essential to systematically characterize positional candidate genes in the large number of loci that have now been identified by GWAS for cardiac rhythm4, rate8 and conduction9. Hence, novel model systems that facilitate systematic, in vivo characterization of a large number of candidate genes are desirable.
In recent years, the zebrafish has become an important model system for genetic and drug screens for human disease10,11. Despite morphological differences to the human heart, zebrafish have a functional, beating heart at ∼24 hours post-fertilization, and the ECG of the two-chambered zebrafish heart is similar to that of humans12. Conditions affecting cardiac electrophysiology have previously been successfully modeled in zebrafish. For example nkx2.5 has been demonstrated to be necessary for establishing HRV13, mutations in kcnh2 were used to model long QT syndrome14, trpm7 was shown to influence heart rate and risk for sinoatrial pauses15, and hcn4 knockdown modeled sick sinus syndrome in zebrafish analogous to the human condition16. Fluorescently labeled transgenes facilitate visualization of cell types and tissues of interest, which can now be accomplished in high-throughput thanks to advances in automated positioning of non-embedded, live zebrafish embryos17. The automated positioning system has already been used to identify and characterize small molecules for myelination18 and thyroid disease19. In addition, the zebrafish has a well-annotated genome, with orthologues of at least 71.4% of human genes20. These genes can be targeted efficiently and in a multiplexed manner thanks to recent advances in Clustered, Regulatory Interspaced, Short Palindromic Repeats (CRISPR) and CRISPR-associated systems (Cas)21. All characteristics combined make zebrafish embryos and larvae an attractive model system to systematically characterize candidate genes for cardiac rhythm, rate and conduction.
The aim of this study was to objectively characterize the most promising candidate genes in GWAS-identified loci for HRV as well as heart rate for a role in cardiac rhythm, rate and function using a large-scale, image-based screen in zebrafish embryos and larvae.
Results
Descriptive results
At 2dpf, some embryos showed sinoatrial pauses (n=39, 10.3%, Supplementary Movie 1) and arrests (n=36, 9.5%, Supplementary Movie 2); cardiac edema (n=16, 4.2%); or uncontrolled atrial contractions (n=1, 0.2%, Supplementary Movie 3). Three larvae died between 2 and 5dpf (Supplementary Figure 1). At 5dpf, fewer larvae displayed sinoatrial pauses (n=9, 2.7%, one of which had a sinoatrial pause at 2dpf); sinoatrial arrests (n=3, 0.9%, none of which had a sinoatrial arrest at 2dpf); and cardiac edema (n=15, 4.5%, nine of which had an edema at 2dpf); while more larvae showed uncontrolled atrial contractions (n=9, 2.7%, none of which had uncontrolled atrial contraction at 2dpf); or an abnormal cardiac morphology and impaired cardiac contractility (n=6, 1.8%, Supplementary Movie 4).
After imaging at 5dpf, all larvae were sequenced at the nine CRISPR-Cas9 targeted sites (Figure 1, Table 1), as well as at three in silico predicted off-target sites (Supplementary Table 1). Transcript-specific dosages were calculated by weighting the number of mutated alleles by their predicted impact on protein function based on Ensembl’s variant effect predictor (VEP). No CRISPR-Cas9 induced mutations were identified at any of the predicted off-target sites (Supplementary Table 2). A total of 169 unique mutations were identified across the nine CRISPR-targeted sites, ranging from three unique mutations in hcn4l to 34 in si:dkey-65j6.2 (i.e. KIAA1755) (Supplementary Table 3). Frameshift mutations were most common (47.9%), followed by missense variants (25.4%), in-frame deletions (14.2%), and synonymous variants (5.9%). Eighty-seven, seventy-two and ten variants were predicted to have a high, moderate, and low impact on protein function, respectively (Supplementary Table 3). Mutant allele frequencies ranged from 4.1% for hcn4l to 93.4% for neo1b (Supplementary Table 4).
Genetic effects on cardiac rhythm and rate
Each additional mutated allele in rgs6 was associated with a lower heart rate at 2dpf, independently of HRV (Figure 2, Supplementary Table 5). Larvae with CRISPR-induced mutations in rgs6 were shorter at 2 and 5 dpf, and had a smaller dorsal area at 5dpf (Supplementary Figure 2, Supplementary Table 6). These results were mostly driven by heterozygous larvae, since only one larva carried two mutated rgs6 alleles (Supplementary Table 4).
At 2dpf, hcn4 influenced heart rate independently of HRV when larvae with two functionally knocked alleles were compared with wildtypes (Supplementary Table 7), and may affect HRV when adjusting for heart rate (Supplementary Table 5). Furthermore, hcn4-affected embryos had higher odds of a sinoatrial pause or arrest at 2dpf (Table 2, Supplementary Table 8). At 5dpf, hcn4 affected heart rate as well as HRV, with effect sizes attenuated but not abolished when adjusting for the other trait (Supplementary Table 5,7). The odds of a sinoatrial arrest was similar at 5dpf compared with 2dpf, but did not reach significance at 5dpf (Supplementary Table 8).
Each additional mutated allele in the main transcript of si:dkey-65j6.2 (i.e. KIAA1755) tended to be associated with a higher HRV at 2dpf as well as at 5dpf (Figure 2, Supplementary Table 5). Additionally, si:dkey-65j6.2-affected larvae tended to be smaller at 2 and 5dpf (Supplementary Figure 2, Supplementary Tables 6, 9). Larvae with mutations in quo (i.e. the other KIAA1755 orthologue) were larger at 2 and 5dpf (Supplementary Figure 2, Supplementary Tables 6, 9).
Larvae with two vs. zero functionally knocked syt10 alleles tended to have a higher odds of sinoatrial pauses at 2 and 5dpf (Supplementary Tables 8, 10).
Larvae with mutations in neo1a tended to be larger at 2dpf (Supplementary Figure 2, Supplementary Tables 6, 9).
Discussion
Large-scale, in vivo follow-up studies of candidate genes in GWAS-identified loci remain sparse. Here we present an objective, image-based pipeline to systematically characterize candidate genes for cardiac rhythm, rate, and conduction-related disorders. Using a zebrafish model system, we confirmed the roles of genes previously implicated in heart rate and rhythm (rgs6 and hcn4), provided a novel gene influencing cardiac rhythm (si:dkey-65j6.2, i.e. KIAA1755), and observed an effect of previously unanticipated genes on early growth and development (rgs6, quo, si:dkey-65j6.2, neo1a). Additionally, we confirmed that knockdown of hcn4 results in sinoatrial pauses16. These results add weight to the notion that GWAS-identified common variants for complex traits can flag genes for which rare, detrimental mutations cause severe, early-onset disorders9,22,23. We show here that systematic characterization of candidate genes in GWAS-identified loci for complex traits using an image- and CRISPR-Cas9-based zebrafish model system can help identify clinically relevant causal genes, and aid confirmation of the role of such genes in whole-exome sequencing efforts in humans by reducing the multiple testing burden24.
Regulator of G protein signaling 6 (RGS6) plays a role in the parasympathetic regulation of heart rate28 and is a negative regulator of muscarinic signaling, thus decreasing HRV to prevent bradycardia. Common variants near RGS6 have been identified in GWAS for resting heart rate8,25,26, heart rate recovery after exercise27 and HRV4. An eQTL analysis using GTEx data showed that the minor T-allele in rs4899412 that is associated with lower HRV is also associated with a higher expression of RGS6 in esophagus and tibial nerve. This is directionally consistent with humans with loss-of-function variants in RGS6 showing higher heart rate variability29. Also in line with this, mice deficient in Rgs6 (Rgs6−/−) were previously characterized by bradycardia, higher HRV and susceptibility to atrioventricular block30. In our study, zebrafish embryos with CRISPR-induced mutations in rgs6 had a lower heart rate at 2dpf, independently of HRV. Hence, our results for heart rate are in line with studies in murine models. However, we did not detect an effect of mutations in rgs6 on HRV in our study, possibly due to the low mutant allele frequency of rgs6.
The locus harboring HCN4 has been identified in GWAS for HRV6, heart rate5 and atrial fibrillation31. HCN4 is arguably the most well characterized gene in the context of cardiac rhythm. It belongs to the family of “funny” channels, aptly named for being activated upon hyperpolarization and non-selective permeability by Na+ and K+. It is expressed in the sinoatrial node32,33 and plays an important role in cardiac pace making34. The heart rate lowering agent ivabradine35 is an open channel blocker of HCN4, and reduces cardiovascular and all-cause mortality in heart failure patients36. Hcn4−/−mice die prenatally, and analyses of their hearts show no arrhythmias and a lower heart rate compared with wildtype and heterozygotes34. In line with this, humans that are heterozygous for an HCN4 loss of function mutation are typically characterized by bradycardia. Comparably, knockdown of hcn4 in zebrafish using morpholinos resulted in bradycardia and sinoatrial pauses16. We identified nine zebrafish embryos that were compound heterozygous for mutations that result in a functional knock of hcn4, and one embryo that was compound heterozygous for such mutations in hcn4l. Mutations in both orthologues did not deviate from HWE (Supplementary Table 4). Four of the nine compound heterozygous larvae for hcn4, and four of the eight larvae that were heterozygous for mutations in hcn4 as well as hcn4l showed a sinoatrial arrest before the atrial recording was initiated at 2dpf. Zebrafish embryos can survive without a functional heart at this stage, thanks to adequate tissue oxygenation by diffusion37. This allowed us to witness genetically driven cardiac arrests that would have been lethal in embryos of other species. However, the direction of effect on heart rate that we observed is unexpected. It most likely reflects overcompensation by hcn4l, or another gene that has been induced by hcn4 knockout38. We cannot rule out that some of the CRISPR-Cas9 induced mutations represent a gain-of-function either. Our gRNA binds in the first exon of hcn4, and cuts in the vicinity of the ion-transport domain, as annotated by protein families (PFAM)39. A gain-of-function variant was identified in humans that increased HCN4’s susceptibility to cAMP40. In line with this, overexpression of Hcn4 in mice resulted in lower HRV41. Hence, we may have induced an hcn4 gain-of-function mutation in a subset of the larvae.
Non-synonymous SNPs in KIAA1755 have been identified in GWAS for HRV6 and heart rate5. In our analysis, si:dkey-65j6.2-affected larvae have a higher HRV at 2 and 5dpf, largely independently of heart rate. In humans, the C-allele of the HRV-associated rs6123471 in the 3’ UTR of KIAA1755 is associated with a lower expression of KIAA1755 in brain and aorta (GTEx42), a higher HRV6, and a lower heart rate5. Hence, a higher HRV at 2 and 5dpf for each additional mutated allele in si:dkey-65j6.2 is directionally consistent with results in humans. Genetic effects on HRV and heart rate were mutually adjusted in our sensitivity analysis, which may have revealed the unbiased direction of effect for mutations in KIAA1755 on HRV. In other words: the GWAS-identified association of the KIAA1755 locus - and possibly other loci - with heart rate may have been driven by HRV. This would explain why the eleven HRV-associated loci that showed evidence of an association with heart rate all did so in the expected (i.e. opposite) direction from a phenotypic point of view6. KIAA1755 is a previously uncharacterized gene that shows a broad expression pattern, including different brain regions and the left atrial appendage (GTEx42). Future mechanistic studies are required to distill how KIAA1755 influences heart rhythm.
SYT10 belongs to the family of synaptogamins: transmembrane proteins that are involved in regulation of membrane trafficking in neurons43, which is important for neurotransmitter release44. Variants in or near SYT10 have previously been associated with HRV6 as well as with heart rate 5 and heart rate recovery27. Since SYT10 is mainly expressed in brain, tibial nerve and pituitary (GTEx42). The trend towards higher odds of sinoatrial pauses in larvae with CRISPR-induced mutations in syt10 may be mediated by altered neurotransmitter release.
In addition to the additive analysis, we were able to compare embryos and larvae with two functionally knocked out alleles with wildtypes for six out of nine orthologs. These analyses largely confirmed the association of the additive analysis, but also revealed novel associations. Screening the F0 or F2 generation instead would have had their own advantages and disadvantages. Wu and colleagues described a powerful approach to efficiently disrupt and analyze F0 larvae45. However, this approach requires sham injected controls, and is not compatible with multiplexing of many gRNAs. Hence, we argue that non-conclusive results for a subset of candidate genes in the screen is not problematic when systematically characterizing a large number of candidate genes.
The large differences in mutant allele frequency across target genes can be attributed to several factors. First, the mosaic founders were in-crossed six times, while random mating was allowed. Hence, we did not ascertain which fish generated the offspring in each mating. Secondly, while all gRNAs were pre-tested, mutated alleles that didn’t affect the germline were not included in our screen. Thirdly, larvae that carry embryonic lethal variants were not included in the screen.
Four limitations of our study should be discussed. Firstly, acquiring 30s recordings implies that false negatives for sinoatrial pauses or arrests are inevitable. We decided to exclude the 54 embryos and eight larvae with a sinoatrial pause or arrest during positioning for imaging from the analysis, because the case status of these samples cannot be confirmed objectively; yet they are not appropriate controls either. This limitation will have resulted in conservative effect estimates for sinoatrial pauses and arrests. Secondly, CRISPR guide-RNAs with predicted off-target effects free from mismatches were avoided. However, two of the selected targets - i.e. for hcn4 and si:dkey-65j6.2 - had predicted exonic off-targets with three mismatches at the time we designed the gRNAs (Supplementary Table 1). Human orthologues of potential off-target genes dclk1 and galnt10 have previously been associated with heart rate variability-related traits46 (dclk1), as well as with carotid intima-media thickness47 and body mass index48–50 (galnt10). However, sequencing larvae at the three predicted off-target regions did not show any CRISPR-Cas9 induced mutations (Supplementary Table 2). In line with predictions by Varshney et al., off-target effects are thus unlikely to have influenced our results51. We cannot exclude the possibility of there being other, non-predicted off-target effects though. Thirdly, we in-crossed mosaic founder mutants (F0) and phenotypically screened and sequenced the F1 generation. For some genes, this yielded a small number of larvae with 0 or 2 mutated alleles (i.e. for rgs6, hcn4, and hcn4l), resulting in a lower statistical power to find true genetic effects - if present - for these genes. In spite of this limitation, we were able to detect significant effects of mutations in rgs6 and hcn4. Finally, we recorded the atrium only, to enable a higher frame rate, a higher resolution in time for HRV, and a higher statistical power to detect small genetic effects on HRV. As a result, any ventricular abnormalities were not registered, and uncontrolled atrial contractions may thus reflect atrial fibrillation, premature atrial contractions, high atrial rate, or atrial tachycardia.
Strengths of our study include its repeated measures design, which enabled us to capture genetic effects at different stages of early development in zebrafish. Furthermore, the throughput of the setup allowed us to examine the effect of multiple genes simultaneously, in an unprecedented sample size for in vivo genetic screens. Our results demonstrate that a large sample size is paramount to robustly detect genetic effects on complex traits in zebrafish embryos and larvae when screening the F1 generation, even for functional knockout mutations. Identifying CRISPR-Cas9-induced mutations allele-specifically using a custom-written algorithm helped us classify larvae as heterozygous and compound heterozygous, which in turn helped pinpoint the true effect of mutations in these genes. Also, our study is based on a high-throughput imaging approach with objective, automated quantification, as compared with more manual annotation of previous studies5. Finally, for all genes that showed an effect on HRV, directions of effect were directionally consistent with eQTL associations in humans. This further emphasizes the strength of our model system and the robustness of our findings.
In conclusion, our large-scale imaging approach highlights that zebrafish embryos and larvae can be used for rapid and comprehensive follow-up of GWAS-prioritized candidate genes. This will likely increase our understanding of the underlying biology of heart rate and rhythm, and potentially yield novel drug targets.
Methods
Candidate gene selection
Candidate genes in GWAS-identified loci for HRV were identified as described in detail in Nolte et al.4. Of the 18 identified candidate genes, six were selected for experimental follow-up. This selection was based on overlap with findings from GWAS for heart rate (KIAA1755, SYT10, HCN4, GNG11)5, as well as with results from eQTL analyses in sinoatrial node and brain (RGS6). Additional candidate genes from the same or nearby loci were also selected for experimental follow-up, i.e. NEO1, which resides next to HCN4. Zebrafish orthologues of the human genes were identified using Ensembl, as well as using a comprehensive synteny search using Genomicus52 (Supplementary Table 1). Of the selected genes, GNG11, SYT10 and RGS6 have one orthologue in zebrafish, and KIAA1755, HCN4 and NEO1 each have two orthologues, resulting in a total of nine zebrafish orthologues for six human candidate genes (Table 1).
Mutagenesis
All nine zebrafish genes were targeted together using a multiplexed CRISPR-Cas9 approach described recently21. Briefly, guide-RNAs (gRNAs) were selected using ChopChop53 and CRISPRscan54 (Supplementary Table 1), based on their predicted efficiency, a moderate to high GC-content, location in an early exon, and absence of predicted off-target effects without mismatches. Oligonucleotides were designed as described21, consisting of a T7 or SP6 promotor sequence (for gRNAs starting with ‘GG’ or ‘GA’, respectively), a gene-specific gRNA-target sequence, and an overlap sequence to a generic gRNA. The gene-specific oligonucleotides were annealed to a generic 80 bp long oligonucleotide at 98°C for 2 mins, 50°C for 10 mins, and 72°C for 10 mins. The products were checked for correct length on a 2% agarose gel. The oligonucleotides were subsequently transcribed in vitro using the manufacturer’s instructions (TranscriptAid T7 high yield transcription kit / MEGAscript SP6 transcription kit, both ThermoFisher Scientific, Waltham, USA). The gRNAs were purified, after which the integrity of the purified gRNAs was examined on a 2% agarose gel. The zebrafish codon-optimized plasmid pT3TS-nls-zCas9-nls was used as a template to produce Cas9 mRNA55. The plasmid was linearized with Xba1, and then purified using the Qiaprep Spin Miniprep kit (Qiagen, Hilden, Germany). The DNA was transcribed using the mMESSAGE mMACHINE T3 Transcription Kit (ThermoFisher Scientific, Waltham, USA), followed by LiCl precipitation. The quality of the RNA was confirmed on a 1% agarose gel.
Husbandry & microinjections
A zebrafish line with GFP-labelled α-smooth muscle cells Tg(acta2:GFP)56 was used to visualize the beating heart. To this end, eggs from an in-cross of Tg(acta2:GFP) fish were co-injected with a mix of Cas9 mRNA (final concentration 150 ng/µl) and all nine gRNAs (final concentration 25 ng/µl each) in a total volume of 2nL at the single-cell stage. CRISPR-Cas9 injected embryos were optically screened for the presence of Tg(acta2:GFP) at 2 days post fertilization (dpf), using an automated fluorescence microscope (EVOS FL Cell imaging system, ThermoFisher Scientific, Waltham, USA). Tg(acta2:GFP) carriers were retained and raised to adulthood in systems with circulating, filtered and temperature controlled water (Aquaneering, Inc, San Diego, CA). All procedures and husbandry were conducted in accordance with Swedish and European regulations, and have been approved by the Uppsala University Ethical Committee for Animal Research (C142/13 and C14/16).
Experimental procedure imaging
The mosaic founder mutants (F0 generation) were only used for reproduction. After crossing the founder mutants, F1 embryos were used for experiments. To reach the experimental sample size, founder mutants were incrossed six times. At all occasions, mating was entirely random. Eggs were collected after F0 fish were allowed to reproduce for 45 mins to minimize variation in developmental stage. Fertilized eggs were placed in an incubator at 28.5°C. At 1dpf, embryos were dechorionated using pronase (Roche Diagnostics, Mannheim, Germany).
At 2dpf, embryos were removed from the incubator and allowed to adapt to controlled room temperature (21.5 °C) for 20 mins. Individual embryos were exposed to 100 µg/ml Tricaine (MS-222, Sigma-Aldrich, Darmstadt, Germany) for 1 min before being aspirated, positioned in the field of view of a fluorescence microscope, and oriented dorsally using a Vertebrate Automated Screening Technology (VAST) BioImager (Union Biometrica Inc., Geel, Belgium). We subsequently acquired twelve whole-body images, one image every 30 degrees of rotation, using the camera of the VAST BioImager to quantify body length, dorsal and lateral surface area and volume, as well as the presence or absence of cardiac edema. The VAST BioImager then positioned and oriented the larva to visualize the beating atrium and triggered the upright Leica DM6000B fluorescence microscope to start imaging using a HCX APO L 40X/0.80 W objective and L5 ET, k filter system (Micromedic AB, Stockholm, Sweden). Images of the beating atrium were acquired for 30s at a frame rate of 152 frames/s using a DFC365 FX high-speed CCD camera (Micromedic AB, Stockholm, Sweden). After acquisition, the larvae were dispensed into a 96-well plate, rinsed from tricaine, and placed back into the incubator. The procedure was repeated at 5dpf (i.e. the larval stage), to allow genetic effects that influence HRV and heart rate differently at different stages of development to be captured57. After imaging at 5dpf, the larvae were once again dispensed into 96-well plates, sacrificed, and stored at - 80°C for further processing.
Quantification of cardiac traits and body size
A custom-written MATLAB script was used to convert the images acquired by the CCD camera into quantitative traits. To acquire the heart rate, each frame of the sequence was correlated with a template frame. The repeating pattern yields a periodic graph from the correlation values and by detecting the peaks in the graph we can assess the heart rate. The template frame should represent one of the extreme states in the cardiac cycle, i.e. end-systole or end-diastole. To detect these frames, we examined the correlation between the first 100 frames. The combination of frames that showed the lowest correlation corresponded to the heart being in opposite states. One of these frames was chosen as the template 58.
This numeric information was subsequently used to quantify: 1) heart rate as the inverse of RR-interval; 2) the standard deviation of the normal-to-normal RR interval (SDNN); and 3) the root mean square of successive heart beat interval differences (RMSSD). Finally, a graph of pixel changes over time was generated across the 30s recording to help annotate the script’s performance. The files containing the inter-beat-intervals were used to objectively quantify sinoatrial pauses (i.e. the atrium stops beating for longer than 3x the median inter-beat-interval of the larva, Supplementary Movie 1) and sinoatrial arrests (i.e. the atrium stops beating for longer than 2s, Supplementary Movie 2) using a custom-written Stata script. The graphs of pixel changes over time were also used to identify larvae with other abnormalities in cardiac rhythm. Such abnormalities were annotated as: uncontrolled atrial contractions (Supplementary Movie 3); abnormal cardiac morphology (i.e. a tube-like atrium, Supplementary Movie 4); or impaired cardiac contractility (i.e. a vibrating rather than a contracting atrium, Supplementary Movie 4). These phenotypes were annotated independently by two investigators (B.v.d.H. and M.d.H.), resulting in an initial concordance rate >90%. Discrepancies in annotation were discussed and re-evaluated to reach consensus.
Bright-field images of the larvae were used to assess body length, dorsal and lateral surface area, and volume. Images were automatically segmented and quantified using a custom-written CellProfiler59 pipeline, followed by manual annotation for segmentation quality. Larvae with suboptimal segmentation quality due to the presence of an air-bubble, a bent body, an incomplete rotation within the capillary, partial capturing of the larva, or an over-estimation of size were replaced by images with a 180° difference in rotation, or excluded if that image was also sub-optimally segmented. The larva was excluded from the analysis for body volume if more than four of the 12 images had a bad segmentation. Imaging, image quantification and image quality control were all performed blinded to the sequencing results.
Quality control of phenotype data
A series of quality control steps was performed to ensure only high-quality data was included in the genetic association analysis (Supplementary Fig. 1). First, graphs indicating that one or more true beats were missed by the script were removed from the analysis a priori (Supplementary Fig. 1). Second, embryos and larvae showing uncontrolled atrial contractions, sinoatrial pauses or arrests, edemas, abnormal morphology and/or reduced contractility, and embryos/larvae annotated to have a sinoatrial pause while positioning before video acquisition were excluded from the analyses for HRV and heart rate (Supplementary Fig. 1). Third, genetic effects on sinoatrial pauses and arrests were examined at 2dpf and 5dpf. For the remainder of the cardiac abnormalities, too few cases were observed to justify a separate association analysis. Furthermore, embryos and larvae that showed a sinoatrial pause or arrest in between positioning and video acquisition, but not during the recording, were excluded from the analysis for sinoatrial pauses and arrests, since we cannot ascertain case status in the same rigorous manner for such samples, but they are not appropriate controls either.
Sample preparation for sequencing
After imaging at 5dpf, larvae were sacrificed and DNA was extracted by exposure to lysis buffer (10 mM Tris-HCl pH8, 50 mM KCl, 1 mM EDTA, 0.3% Tween 20, 0.3% Igepal) and proteinase K (Roche Diagnostics, Mannheim, Germany) for 2 h at 55°C, followed by 10 min at 95°C to deactivate the proteinase K. Gene-specific primers (150bp-300 bp) amplifying gRNA-targeted and putative off-target regions in dclk1b and both galnt10 orthologues (Supplementary Table 1) were distilled from ChopChop53 and Primer360, and Illumina adaptor-sequences were added. Additionally, we included 96 uninjected Tg(acta2:GFP) larvae for sequencing across all gene-specific regions to adjust for natural occurring polymorphisms. The first PCR was conducted by denaturation at 98°C for 30s; amplification for 35 cycles at 98°C for 10s, 62°C for 30s and 72°C for 30s; followed by a final extension at 72°C for 2 mins. Amplified PCR products were cleaned using magnetic beads (Mag-Bind PCR Clean-up Kit, Omega Bio-tek Inc. Norcross, GA). The purified products were used as a template for the second PCR, in which Illumina Nextera DNA library sequences were attached to allow multiplexed sequencing of all CRISPR-targeted sites across 384 larvae in a single lane. The second PCR amplification was performed by denaturation at 98°C for 30s; amplification for 25 cycles at 98°C for 10s, 66°C for 30s and 72°C for 30s; followed by a final extension at 72°C for 2 mins. Products were then purified using magnetic beads. All liquid handling was performed using a Hamilton Nimbus robot equipped with a 96-head (Hamilton robotics, Kista, Sweden). Samples were pooled and sequenced in a single lane on a MiSeq (300 bp paired-end, Illumina Inc., San Diego, CA) at the National Genomics Infrastructure, Sweden.
Processing of sequencing data
A custom-written bioinformatics pipeline was developed in collaboration with the National Bioinformatics Infrastructure Sweden to prepare .fastq files for analysis. First, a custom-written script was used to de-multiplex the .fastq files by gene and well. PEAR61 was then used to merge paired-end reads, followed by removal of low-quality reads using FastX62. The reads were then mapped to the wildtype zebrafish genome (Zv11) using STAR63. Next, we converted files from .sam to .bam format using samtools64, after which variants - mostly indels and SNVs - were called allele specifically using a custom-written variant calling algorithm in R (Danio rerio Identification of Variants by Haplotype - DIVaH). A summary of all unique sequences identified for the orthologues and their respective alignment report string (Concise Idiosyncratic Gapped Alignment Report (CIGAR)) is shown in Supplementary Table 2. All unique variants (Supplementary Table 3) located within ±30bps of the CRISPR-targeted sites that were identified across the two alleles were subsequently pooled, and used for functional annotation using Ensembl’s VEP 65. Naturally occuring variants based on sequencing of uninjected Tg(acta2:gfp) larvae were excluded. Noteworthy, all variants per transcript get annotated individually, rather than on a per transcript basis. In absence of a continuous score provided by variant prediction algorithms, we attributed a score of 0.33, 0.66 and 1, and that pilot experiments using hcn4 and heart rate suggested that this was a more powerful approach than not weighting.
Transcript-specific dosage scores were then calculated by retaining the variant with the highest predicted impact on protein function, and assigning it a score of 0 (no effect), 0.2 (modifier variant), 0.33 (low), 0.66 (moderate) or 1 (high). Transcript-specific dosage scores were subsequently calculated by summing the allele-specific scores at each embryo and target site. Since all transcripts within a target site were affected virtually identically, we only used the main transcript of each target site for the genetic association analysis.
Most larvae had successfully called sequences in all nine CRISPR-targeted sites. Larvae with more than three missing sequences were excluded from the genetic association analysis. For the remaining larvae, the mean dosage of the transcript was imputed for missing calls. For neo1b, calling failed in 78 larvae (Supplementary Table 4). The imputed mean dosage for the main transcript of neo1b was still included in the genetic association analysis, since the mutant allele frequency in successfully called larvae was high (i.e. 0.934), and using an imputed dosage was anticipated to influence the results less than to discard the gene while it had been targeted. However, the results for neo1b should be interpreted in light of its call rate. The mutant allele frequency was low for hcn4, hcn4l and rgs6. All three putative off-targets regions (dclk1b, both galnt10 orthologues, Supplementary Table 1) were classified as wildtype across all larvae and had no CRISPR-Cas9 induced mutations within ±30bps of the cut-site (Supplementary Table 2).
Statistical analysis
The standard deviation of NN-intervals (SDNN) and the root mean square of successive differences (RMSSD) were strongly correlated at 2dpf (r2=0.78) and 5dpf (r2=0.83), so a composite endpoint ‘HRV’ was calculated as the average of SDNN and RMSSD. In larvae free from sinoatrial pauses and arrests, abnormal cardiac morphology, impaired cardiac contractility, and edema HRV and heart rate at 2dpf (n=279) or 5dpf (n=293) were inverse-normally transformed to ensure a normal distribution, and to allow comparison of effect sizes across traits. Genetic effects on HRV and heart rate were subsequently examined using hierarchical linear models at 2dpf and 5dpf separately (Stata’s xtmixed), mutually adjusted for the time of imaging (fixed factors), and with larvae nested in six batches (random factor with fixed slope). Embryos and larvae in which a sinoatrial pause and/or arrest was observed in between positioning and recording were excluded from the analysis. In a sensitivity analysis, we mutually adjusted for the other outcome (i.e. heart rate and HRV).
In further analyses, genetic effects were examined for sinoatrial pause and arrest at 2dpf, as well as sinoatrial pause at 5dpf using a logistic regression analysis. Genetic effects on body size were also examined, using hierarchical linear models on inverse-normally transformed outcomes.
For dichotomous as well as continuous outcomes, genetic effects were examined using an additive model, with dosage scores for all target sites as independent exposures in the same model, i.e. mutually adjusted for effects of mutations in the other orthologues. Additionally, larvae with two functionally knocked alleles were compared with larvae free from CRISPR-induced mutations for six out of nine orthologues. Missing genotypes were imputed at the mean in this analysis. For each outcome, association analyses were examined for the main transcript of the orthologue. P-values <0.05 were considered to reflect statistically significant effects. All statistical analyses were performed using Stata MP 14.2.
Sources of funding
MdH is a fellow of the Swedish Heart-Lung Foundation (20170872) and is supported by project grants from the Swedish Heart-Lung Foundation (20140543, 20170678), the Swedish Research Council (2015-03657), and NIH/NIDDK (R01DK106236, R01DK107786, U01DK105554).
Disclosures
None.
Acknowledgments
The computations were performed on resources provided by SNIC through Uppsala Multidisciplinary Center for Advanced Computational Science (UPPMAX) under Project SNIC b2015283. The authors would like to acknowledge support from Science for Life Laboratory, the National Genomics Infrastructure (NGI) and UPPMAX for providing assistance in massive parallel sequencing and computational infrastructure. Support from the National Bioinformatics Infrastructure Sweden (NBIS) is also gratefully acknowledged. Constructive discussions with the Genome Engineering Zebrafish (GEZ) facility are also acknowledged.