Abstract
Given the need for comparability across subjects and studies, the quality of registration to a standard space is crucial for the reliability of Magnetic Resonance Imaging (MRI), and in particular functional MRI (fMRI). Present small animal MRI workflows fall short in terms of quality and reliability, commonly utilizing high-level scripts optimized for human data (adapting data to the scripts rather than vice-versa), and relying on interactive operator quality control (QC), which is infrequent, open to bias, slow, and unreproducible. In this fully reproducible article we showcase a novel mouse-brain-optimized workflow (accessible via Bash and Python), and a standard space suited to harmonize data between analysis and operation. We present four separate metrics for automated QC, and a visualization method to aid operator inspection. Benchmarking this workflow against common legacy practices (which we detail and comment) reveals that it performs more consistently, better preserves variance across subjects while minimizing variance across sessions, and improves volume conservation RMSE 2.8-fold, and smoothness conservation RMSE 2.9-fold. The “SAMRI Generic” workflow sets a new standard for small animal MRI registration, ensuring robustness, comparability, and validity of region assignment.
Background
In order to make any generalizable statements regarding brain function and organization, an equivalence between brain areas across individuals needs to be established. This is done by spatial transformation of brain maps in a study to a population or standard reference template. This process, called registration, is performed as part of any neuroimaging workflow attempting to produce results which are both spatially resolved and generalizable across the population.
The computations required for registration are commonly performed at the very onset of the preprocessing workflow, though the actual image manipulation may only take place much later, once inter-subject comparison becomes needed. As a consequence of this peripheral positioning in the preprocessing sequence, as well as its general independence from experimental designs and hypotheses, registration is often relegated to default values and exempt from rigorous design efforts and QC.
Human brain imaging uniquely benefits from high-level functions (e.g. flirt and fnirt from the FSL package[1], or antsIntroduction.sh from the ANTs package[2]), optimized for the size and spatial features of the human brain. The availability and widespread use of such functions mitigates issues which would otherwise arise from a lack of QC. In mouse brain imaging, however, registration is frequently performed using the selfsame high-level functions from human brain imaging — rendered usable for mouse brain data by adjusting the nature of the data to fit the priors and optimized parameters of the functions, rather than vice-versa.
This general approach compromises data veracity, limits the degree to which processing can be optimized for mouse brain applications, and thus represents a notable hurdle for the methodological improvement of mouse brain imaging. Furthermore, such solutions are implemented ad hoc and are not thoroughly documented anywhere in the field. We thus explicitly describe current practices, in an effort to not only propose better solutions, but do so in a falsifiable manner which provides adequate detail for both the novel and the legacy methods.
Manipulations
The foremost data manipulation procedure in present-day mouse MRI is the adjustment of voxel dimensions. These dimensions are represented in the Neuroimaging Informatics Technology Initiative format (NIfTI) header [3] by affine transformation parameters — which map data matrix cordinates to geometrically meaningful spatial coordinates. Manipulations of the affine parameters are performed in order to make the data represent not the physiological mouse brain dimensions, but volumes corresponding to what human-optimized brain extraction, bias correction, and registration interfaces expect (commonly constituting a 10-fold increase in each spatial dimension).
Another notable data manipulation procedure consists in adjusting the data matrix content itself, so that human-prior based brain extraction will produce acceptable results. While conceptually superior solutions adapting parameters and priors to animal data are available [4, 5] and might remove the need for data adaptation at this step, rudimentary solutions remain popular. Many consist of applying an ad-hoc percentile threshold to clear non-brain or distal brain tissue by intensity, and to leave a more spherical brain for the human masking function to operate on. Notably, both the function adaptations for animal data and the animal data matrix adaptations for use with human brain extraction functions are known to wholly or partly remove the olfactory bulbs — which is why sometimes the choice is made to instead simply forego brain extraction.
Often, the orientation of the scan is seen as problematic, and consequently deleted. This consists in resetting the S-Form affine from the NIfTI header to zeroes, and is intended to mitigate a data orientation produced by the scanner which is incorrect with respect to the target template. While it is true that the scanner affine reported for mouse data may be non-standard (the confusion is two-fold: mice lie prone with the coronal plane progressing axially whereas higher primates lie supine with the horizontal plane progressing axially), affines of mouse brain templates may be nonstandard as well. A related manipulation is dimension swapping, which changes the order of the NIfTI data matrix rather than the affine. Occasionally, correct or automatically redressable affine parameters are thus deleted and data is reordered beyond easy recovery, in order to correspond to a malformed template.
Templates
As the above eminently demonstrates, the template is a key component of a registration workflow. Templates used for mouse brain MRI registration are heterogeneous and include histological, as well as ex vivo MRI templates, scanned either inside the intact skull or after physical brain extraction.
Histological templates benefit from higher resolution and access to molecular imaging data in the same coordinate space. Such templates are not produced in volumetric sampling analogous to MRI, and are often not assigned a meaningful affine after conversion to NIfTI. Histological contrast may only poorly correlate with any MR contrast, making registration less reliable, or necessitating the use of similarity metrics which impose additional restrictions. Not least of all, histological templates may be severely deformed and may lack distal parts of the brain such as the olfactory bulbs due to the extraction and sampling process. Data registered to them may be particularly difficult to use for navigation in the intact mouse brain, e.g. during data acquisition or stereotactic surgery.
Ex vivo templates based on extracted brains share most of the deformation issues present in histological templates; they are, however, available in MR contrasts, making registration far easier. They suffer from a lower resolution, and need to have any histological data relevant for downstream analysis first registered to them. Ex vivo templates based on intact mouse heads provide both MR contrast and brains largely free of deformation and supporting whole brain registration.
Challenges
The foremost challenges in mouse MRI registration consist in: eliminating data-degrading workarounds, reducing reliance on high-level interfaces with inappropriate optimizations, and reducing the number of standard space templates. Information loss (e.g. pertaining to both the affine and the data matrix) during preprocessing is a particularly besetting issue, since the loss of data at the onset of a neouroimaging workflow will persist in all downstream steps and preclude numerous modes of analysis (as depicted in fig. 1b).
The Optimized Workflow
The complexity of MRI processing workflows should be manageable to prospective users with only cursory programming experience. However, workflow transparency, sustainability, and reproducibility should not be compromised for trivial features. We thus abide by the following design guidelines: (1) each workflow is represented by a high-level function, whose parameters correspond to operator-understandable concepts, detailing operations performed, rather than computational implementations; (2) workflow functions are highly parameterized but include workable defaults, so that users can change their function to a significant extent without editing the constituent code; (3) graphical or interactive interfaces are avoided, as they impede reproducibility, encumber the dependency graph, and reduce the sustainability of the project.
The language of choice for workflow handling is Python, owing to its Free and Open Source (FOSS) dependency stack, readability, wealth of available libraries, ease of package management, and its large and dynamic developer community. While workflow functions are written in Python, we also provide automatically generated Command Line Interfaces (CLIs) for use directly with Bash. These autogenerated CLIs ensure that features become available in Bash and Python synchronously, and workflows behave identically regardless of the invocation interface.
Technologies
Internally, the workflow functions make use of the Nipype [6] package, which provides high-level workflow management and execution features. Via this package, functions provided by any other package can be encapsulated in a node (complete with error reporting and isolated re-execution support) and integrated into a directed workflow graph. Paralellization can also be managed via a number of execution plugins, allowing excellent scalability. Most importantly, Nipype can generate graph descriptor language (DOT) summaries, as well as visual workflow representations suitable for operator inspection, graph theoretical analysis, and programmatic comparison between workflow variants.
Via Nipype, we utilize basic MRI preprocessing functions from the FSL package [1] and registration functions from the ANTs package [2]. While there is theoretically no limit to the number of external packages usable with Nipype, we constrain our choice as much as possible in order to minimize the dependency graph. The choice of the ANTs package (in addition to FSL, which also provides registration functions) owes to the package’s functions being more highly parameterized. This feature allows us to avoid maladaptive optimization choices, and instead fine-tune the registration to the overarching characteristics of the brain type at hand. Additionally, we have implemented a number of functions in our workflow directly, e.g. to read BIDS [7] inputs, and perform dummy scans management.
Given the aforementioned guiding principles, and the hitherto listed technologies, we have constructed two registration workflows: The “Legacy” workflow (fig. 1c), which exhibits the common practices detailed in the Background section; and our novel “Generic” workflow (fig. 1d). Both workflows start by performing dummy scan correction on the functional MRI data and the stimulation events file, based on BIDS metadata, automatically parsed from Bruker ParaVision metadata. The “Legacy” workflow subsequently applies a tenfold multiplication to the voxel size (making the brain size more human-like), and deletes the orientation information from the affine. Further, the dimensions are swapped so that the data matrix matches the RPS (left→Right, anterior→Posterior, inferior→Superior) orientation of the “Legacy” template (see fig. 2b). Following these data manipulation steps, a temporal mean is computed, and an empirically determined signal threshold (10 % of the 98th percentile) is applied. Subsequently, the bias field is corrected using the fast function of the FSL package, and parts of the image are masked using the bet function from FSL. The image is then warped into the template space using the antsIntroduction.sh function of the ANTs package. Lastly, the affine variants are harmonized. The “Generic” workflow follows up on dummy scan correction with slice timing correction, computes the temporal mean of the functional scan (to obtain a more representative contrast for the whole time course), and applies a bias field correction to the temporal mean — using the N4BiasFieldCorrection function of the ANTs package, with spatial parameters adapted to the mouse brain. Analogous operations are performed on the structural scan, following which the structural scan is registered to the reference template, and the functional scan temporal mean is registered to the structural scan — using the antsRegistration function of the ANTs package, with spatial parameters adapted to the mouse brain. The structural-to-template and functional-to-structural transformation matrices are then merged, and applied in one warp computation step to the functional data — while the structural data is warped solely based on the structural-to-template transformation matrix.
For Quality Control we distribute as part of this publication additional workflows using the NumPy [8], SciPy [9], pandas [10], and matplotlib packages [11], as well as Seaborn [12] for plotting, and Statsmodels [13] for top-level statistics, using the HC3 heteroscedasticity consistent covariance matrix [14]. Specifically, distribution densities for plots are drawn using the Scott bandwidth density estimator [15].
Distribution
As registration is a crucial step of a larger data analysis process (rather than an analysis process in its own right), the workflows are best distributed as part of a full stack (i.e. from raw data to statistic summaries) workflow package. We include the aforementioned Generic and Legacy workflows in the SAMRI (Small Animal Magnetic Resonance Imaging) data analysis package [16] of the ETH/UZH Institute for Biomedical Engineering.
Template Package
The suitability of a registration workflow as a standard is contingent on the quality of the template being used. Particularly the size and orientation of the template may pose constraints on the workflow. For example, an unrealistically inflated template size mandates according parameters for all functions which deal with the data in its affine space. Additionally, if the template axis orientation deviates by more than 45° from the image to be registered, or if an axis is flipped, the global maximum of the first (rigid) registration steps may not be correctly determined, and the image would then be skewed and nonlinearly deformed to match the template at an incorrect orientation. Consequently, template quality needs to be ascertained, and a workflow-compliant default should be provided.
Our recommended template (fig. 2a) is derived from the DSURQE template of the Toronto Hospital for Sick Children Mouse Imaging Center [17]. The geometric origin of this template is shifted to match the Bregma landmark, and thus provide integration with histological atlases and surgical procedures, which commonly use Bregma as a reference. The template is in the canonical orientation of the NIfTI format, RAS (left→Right, posterior→Anterior, inferior→Superior), and has a coronal slice positioning reflective of both the typical animal head position in MR scanners and in stereotactic surgery frames. The template is provided at 40 µm and 200 µm isotropic resolutions, and all of its associated mask and label files are identified with the prefix dsurqec in the template packages.
We bundle the aforementioned MR template with two additional histological templates, derived from the Australian Mouse Brain Mapping Consortium (AMBMC) [18], and the Allen Brain Institute (ABI) [19] templates. While these suffer from shortcomings listed under the Background section, we include the AMBMC template due to its extra long rostrocaudal coverage, and the ABI atlas due to its role as the reference atlas for numerous gene expression and projection maps. We reorient the AMBMC template from its original RPS orientation to the canonical RAS, and apply an RAS orientation to the orientation-less ABI template after converting it to NIfTI from its original NRRD format. These atlases are also made available at 40 µm and 200 µm isotropic resolutions, and the corresponding files are prefixed with ambmc and abi, respectively.
Additionally, we provide templates in the historically prevalent but incorrect, RPS orientation, and with the historically prevalent tenfold increase in voxel size. These templates are derived from the DSURQE and AMBMC templates, and are prefixed with ldsurque and lambmc, respectively.
Lastly, due to data size considerations, we distribute 15 µm isotropic versions of all atlases available at this resolution (AMBMC and its legacy derivative, as well as ABI) in a separate package. The two packages we thus distribute are called mouse-brain-atlases and mouse-brain-atlasesHD. Up-to-date versions of these archives can be reproduced via a FOSS script collection which handles download, reorienting, and resampling, and was written and released for the purpose of this publication [20].
For the comparisons performed in this text, the dsurqec and ldsurqec template variations (containing the same data matrix, but matched to the orientation and size requirements of the functions in the fig. 1d and fig. 1c workflows, respectively) are referred to as the “Generic” template. Analogously, the ambmc and lambmc template variations are referred to as the “Legacy” template.
Testing Dataset
For the quality control of the workflows, a dataset with an effective size of 102 scans is used. Data from 11 adult animals is included, with each animal scanned on up to 5 sessions (repeated at 14 day intervals). Each session contains an anatomical scan and two functional scans — with Blood-Oxygen Level Dependent (BOLD) [21] and Cerebral Blood Volume (CBV) [22] contrast, respectively (for a total of 68 functional scans).
Anatomical scans are acquired via a TurboRARE sequence, with a RARE factor of 8, an echo time (TE) of 21 ms, an inter-echo spacing of 7 ms, and a repetition time (TR) of 2500 ms, sampled at a sagittal resolution of ∆x(ν) = 166.7 µm, a horizontal resolution of ∆y(φ) = 75 µm, and a coronal resolution of ∆z(t) = 650 µm (slice thickness of 500 µm). The functional BOLD and CBV scans are acquired with a flip angle of 60° and with TR/TE = 1000 ms/15 ms and TR/TE = 1000 ms/5.5 ms, respectively. Functional scans are sampled at ∆x(ν) = 312.5 µm, ∆y(φ) = 281.25 µm, and ∆z(t) = 650 µm (slice thickness of 500 µm). All aforementioned scans are acquired with a Bruker PharmaScan system (7 T, 16 cm bore), and an in-house T/R coil.
The measured animals were fitted with an optic fiber implant (l = 3.2 mm d = 400 µm) targeting the Dorsal Raphe (DR) nucleus in the brain stem. The nucleus was rendered sensitive to optical stimulation by transgenic expression of Cre recombinase under the ePet promoter [23] and viral injection of rAAVs delivering a plasmid with Cre-conditional expression of Channelrhodopsin and YFP — pAAV-EF1a-double floxed-hChR2(H134R)-EYFP-WPRE-HGHpA, a gift from Karl Deisseroth (Addgene plasmid #20298).
The DR was stimulated via an Omicron LuxX 488-60 laser (488 nm) tuned to 30 mW at contact with the fiber implant, according to the protocol listed in table S1. The operation and stimulation procedure, as well as general picture of obtained activation is consistent with previous results [24], and is not further commented in this study.
Interactive Operator Inspection
We complement the automated whole-dataset evaluation metrics detailed at length in this article with convenience functions to ease and improve interactive operator inspection. These functions produce clean, well-paginated, and visually pleasing slice-by-slice views of the registered data, and emphasize one of two different quality assessments. The first view mode highlights single-session registration quality by plotting the registered data as a greyscale bitmap, and the target atlas as a coloured contour (figs. S1a to S1d). The second view mode highlights multisession registration coherence, by plotting the target template as a greyscale bitmap, and the individual session percentile contours in colour (fig. S2).
Reproducibility
The source code for this document and all data analysis shown herein (including registration and QC workflow execution) is published according to the RepSeP specifications [25]. The data analysis execution and document compilation has been tested repeatedly on numerous hardware platforms, with operating sytems including Gentoo Linux and MacOS, and as such we attest that all figures and statistics presented can be reproduced based solely on the raw data, dependency list, and analysis scripts which we distribute.
Evaluation
A major challenge of registration QC is that a perfect mapping from the measured image to the template is undefined. Similarity metrics are ill-suited for QC because they are used internally by registration functions, whose mode of operation is based on maximizing them. Extreme similarity score maximization is not a desired outcome. Particularly if nonlinear transformations are employed, this may result in image distortion which should be penalized in QC. Additionally, similarity metrics are not independent, so this issue cannot be circumvented by maximizing a subset of metrics and performing QC via the remainder. To address this challenge we developed four alternative evaluation metrics: volume conservation, smoothness conservation, variance analysis, and functional analysis. In order to mitigate possible differences arising from qualitative template features, we use these metrics in a multivariate analysis of both templates and workflows.
Volume Conservation
Volume conservation is based on the assumption that the total volume of the scanned segment of the brain should remain roughly constant after preprocessing. Beyond just size differences between the acquired data and the target template, a volume increase may indicate that the brain was stretched to fill in template brain space not covered by the scan, while a volume decrease might indicate that non-brain voxels were introduced into the template brain space.
In order to best analyze volume conservation, a Volume Conservation Factor (VCF) is computed for each processed scan, whereby volume conservation is highest for a VCF equal to 1. For the current implementation we define brain volume as estimated by the 66th voxel intensity percentile of the raw scan before any preprocessing. The arbitrary unit equivalent of this percentile threshold is recorded for each scan and applied to all preprocessing workflow results for that particular scan, to obtain VCF esimates — eq. (1), where v is the voxel volume in the original space, v′ the voxel volume in the transformed space, n the number of voxels in the original space, m the number of voxels in the transformed space, s a voxel value sampled from the vector S containing all values in the original data, and s′ a voxel value sampled from the transformed data.
As seen in fig. 3a, we note that VCF is sensitive to the processing workflow (F1,268 = 191.1, p = 3.50 × 10−33), the template (F1,268 = 530.6, p = 1.71 × 10−65), but not the interaction thereof (F1,268 = 1.311, p = 0.25).
The performance of the Generic SAMRI workflow (with the Generic template) is significantly different from that of the Legacy workflow (with the Legacy template), yielding a two-tailed p-value of 3.8 × 10−14. Additionally, the root mean squared error ratio strongly favours the Generic workflow (RMSEL/RMSEG ≃ 2.8).
Descriptively, we observe that the Legacy level of the template variable introduces a notable volume loss (VCF of −0.34, 95%CI: −0.36 to −0.32), while the Legacy level of the preprocessing variable introduces a volume gain (VCF of 0.20, 95%CI: 0.18 to 0.23). Further, we note that there is a very strong variance increase in all conditions for the Legacy processing workflow (9.7-fold given the Legacy template, and 4.9-fold given the Generic template).
With respect to the data break-up by contrast (fig. 3b), we see no notable main effect for the contrast variable (VCF of −0.03, 95%CI: −0.06 to 0.00). We do, however, report a notable effect for the contrast-template interaction, with the Legacy workflow and CBV contrast interaction level introducing a volume loss (VCF of −0.12, 95%CI: −0.16 to −0.09).
Smoothness Conservation
A further aspect of preprocessing quality is the resulting image smoothness. Although controlled smoothing is a valuable preprocessing tool used to increase the signal-to-noise ratio (SNR), uncontrolled smoothness limits operator discretion in the trade-off between SNR and feature granularity. Uncontrolled smoothness can thus lead to undocumented and implicit loss of spatial resolution and is therefore associated with worse anatomical alignment [26]. We employ a Smoothness Conservation Factor (SCF), which normalizes the smoothness of the preprocessed images with respect to the smoothness of the original images. Our smoothness measure is the full-width at half-maximum (FWHM) of the signal amplitude spatial autocorrelation function (ACF [27]). Since fMRI data usually does not have a gaussian-shaped spatial ACF, we use AFNI [28] to fit the following function in order to compute the FWHM — eq. (2), where r is the distance of two amplitude distribution samples, a is the relative weight of the gaussian term in the model, b is the width of the gaussian and c the decay of the mono-exponential term [29].
With respect to the data shown in fig. 4a, we note that SCF is sensitive to the template (F1,268 = 72.77, p = 1.09 × 10−15), the processing workflow (F1,268 = 485.5, p = 4.17 × 10−62), and the interaction of the factors (F1,268 = 10.66, p = 0.0012).
The performance of the Generic SAMRI workflow (with the Generic template) is significantly different from that of the Legacy workflow (with the Legacy template), yielding a two-tailed p-value of 9.9 × 10−22. In this comparison, the root mean squared error ratio favours the Generic workflow (RMSEL/RMSEG ≃ 2.9).
Descriptively, we observe that the Legacy level of the template variable introduces a smoothness reduction (SCF of −0.14, 95%CI: −0.16 to −0.12), while the Legacy level of the preprocessing variable introduces a smoothness gain (SCF of 0.37, 95%CI: 0.35 to 0.39). Further, we note that there is a strong variance increase for the Legacy processing workflow (4.06-fold given the Legacy template and 4.11-fold given the Generic template).
Given the break-up by contrast shown in fig. 4b, we see no effect for the contrast variable (SCF of 0.04, 95%CI: 0.00 to 0.08). We do, however, report an effect for the contrast-template interaction, with the Legacy workflow and CBV contrast interaction level introducing an increase in smoothness (SCF of 0.05, 95%CI: 0.01 to 0.09).
Variance analysis
An additional way to assess preprocessing quality focuses on the robustness to variability resulting from repeat experimentation, and whether this is attained without overfitting (i.e. compromising physiologically meaningful variability). The core assumption of this analysis of variance is that adult mouse brains in the absence of intervention retain size, shape, and implant position during the 8 week study period. Consequently, when examining similarity scores of preprocessed scans with respect to the target template, more variation should be found across levels of the subject variable rather than session variable. This comparison can be performed using a type 3 ANOVA, modelling both the subject and the session variables. For this assessment we select three metrics with maximal sensitivity to different features: Neighborhood Cross Correlation (CC, sensitive to localized correlation), Global Correlation (GC, sensitive to whole-image correlation), and Mutual Information (MI, sensitive to whole-image information similarity independently of a correlation assumption).
Figure 5 renders the similarity metric scores for both the SAMRI Generic and Legacy workflows (considering only the matching workflow-template combinations). The Legacy workflow produces results which consistently show a higher F-statistic for the session than for the subject variable: CC (subject: F10,19 = 0.13, p = 1, session: F4,19 = 0.51, p = 0.73), GC (subject: F10,19 = 0.65, p = 0.75, session: F4,19 = 3.805, p = 0.02), and MI (subject: F10,19 = 0.95, p = 0.51, session: F4,19 = 3.919, p = 0.017). Notably, for the MI metric the effect of the session variable is significant, but not that of the subject variable.
The Generic SAMRI workflow shows a reversing trend. Resulting data F-statistics are consistently higher for the subject variable than for the session variable: CC (subject: F10,19 = 3.368, p = 0.011, session: F4,19 = 2.095, p = 0.12), GC (subject: F10,19 = 2.119, p = 0.076, session: F4,19 = 1.775, p = 0.18), and MI (subject: F10,19 = 2.687, p = 0.031, session: F4,19 = 2.224, p = 0.1).
Functional Analysis
Functional analysis is a frequently used avenue for preprocessing QC. Its viability derives from the fact that the metric being maximized in the registration process is not the same output metric as that used for QC. This method is however primarily suited to examine workflow effects in light of higher-level applications, and less suited for wide-spread QC (as it is computationally intensive and only applicable to stimulus-evoked functional data).
As a first step we examine statistical power via the negative logarithm of first-level p-value maps (i.e. voxelwise statistical estimates for the probability that each voxel time course is — by chance alone — at least as well correlated with the stimulation regressor as the voxel time course measured). We compute the per-scan average of these values, which we term Mean Significance (MS) — eq. (3), where n represents the number of statistical estimates in the scan, and p is a p-value.
As seen in fig. 6, MS is not sensitive to the processing workflow (F1,268 = 0.023, p = 0.88), but is sensitive to the template (F1,268 = 12.68, p = 0.00044), and the interaction of both factors (F1,268 = 5.741, p = 0.017).
The SAMRI Generic workflow (with the Generic template) does not significantly differ from the Legacy workflow (with the Legacy template) in terms of MS, yielding a two-tailed p-value of 0.95.
Descriptively, we observe that the Legacy level of the template variable introduces a notable significance increase (MS of 1.11, 95%CI: 0.88 to 1.34), while the Legacy level of the preprocessing variable introduces no significant change (MS of −0.05, 95%CI: −0.28 to 0.18), and the interaction of the Legacy template and Legacy processing introduces a significance loss (MS of −1.06, 95%CI: −1.38 to −0.73), Furthermore, we again note a variance increase in all conditions for the Legacy processing workflow (3.3-fold given the Legacy template, and 2.8-fold given the Generic template).
With respect to the data break-up by contrast (fig. 6b), we see no notable main effect for the contrast variable (MS of 0.01, 95%CI: −0.84 to 0.85) and no notable effect for the contrast-template interaction (MS of 0.39, 95%CI: −0.03 to 0.80).
Overall statistical power is, however, independent of the mapping accuracy, and functional analysis effects can further be inspected by visualizing the statistic maps. For a succinct overview capturing both amplitude and directionality of the signal, we present second-level t-statistic maps depicting the CBV and BOLD omnibus contrasts (across all subjects and sessions) in fig. 7. Crucial to the examination of registration quality and its effects on functional readouts is the differential coverage. We note that the Legacy workflow induces coverage overflow, extending to the cerebellum (figs. 7c, 7d, 7g and 7h), as well as to more rostral areas when used in conjunction with the Legacy template (figs. 7d and 7h). Separately from the Legacy workflow, the Legacy template causes acquisition slice misalignment (figs. 7b, 7b, 7d and 7h). Positive activation of the Raphe system, most clearly disambiguated from the surrounding tissue in the BOLD contrast, is notably displaced very far caudally by the joint effects of the Legacy workflow and the Legacy template (fig. 7h). We note that processing with the Generic template and workflow (figs. 7a and 7e), does not show issues with statistic coverage alignment and overflow.
Discussion
The workflow and template design presented herein offer significant advantages in terms of reducing coverage overestimation, uncontrolled smoothness, and guaranteeing session-to-session consistency. This is most clearly highlighted by Volume Conservation (fig. 3), Smoothness Conservation (fig. 4), and Variance Analysis (fig. 5), where the joint usage of the SAMRI Generic workflow and template outperforms all other combinations of the multivariate analysis. This spatial robustness is also revealed in a qualitative examination of higher-level functional maps (fig. 7), where only the combination of the Generic workflow and template provides accurate coverage for both BOLD and CBV modalities. These benefits are provided without compromising statistical power (fig. 6), and also hold for both CBV and BOLD contrasts (figs. 3b, 4b and 6b — where the Generic workflow-template combination is less or equally susceptible to the contrast variable). The performance of the Generic workflow is more consistent across all metrics, as demonstrated by notable reductions of the standard deviation for both VCS, SCF, as well as MS (figs. 3a, 4a and 6a).
Closer model inspection, however, reveals that the strongest source of variability is not the processing factor but the template factor. The Legacy level of the template factor induces both a volume and a smoothness decrease beyond the original data values (figs. 3a and 4a). This clearly indicates a whole-volume effect, whereby a target template smaller than the recoded brain size causes a contraction of the brain during registration, resulting both in a volume and a smoothness loss. This effect can also be observed qualitatively in fig. 7. We thus highlight the importance of an appropriate template choice, and strongly recommend usage of the Generic template on account of its better scale similarity to data acquired in adult mice.
The volume conservation, smoothness conservation, and session-to-session consistency of the SAMRI Generic workflow and template combination are further augmented by numerous design benefits (figs. 1 and 2). These include increased transparency and parameterization of the workflow (which can more easily be inspected and further improved or customized by the end user), veracity of resulting data headers, and spatial coordinates more meaningful for surgery and histology. We acknowledge that, though the SAMRI Generic workflow performs better by comparison, it does not attain a perfect score on any of the target metrics. The fully transparent nature of the workflow, however, is conducive to continued augmentation in excess of the already commendable performance.
Quality Control
A major contribution of this work is the implementation of multiple metrics providing simple, powerful and robust QC for registration performance (VCF, SCF, and Variance Analysis) and the release of a dataset [30] suitable for such multifaceted benchmarking. The VCF and SCF provide good quantitative estimates of distortion prevalence. The variance analysis comparing subject-wise and session-wise variance is an elegant avenue allowing the operator to ascertain how much a registration workflow is potentially overfitting. These metrics are relevant to both preclinical and clinical MRI workflow improvements, and could themselves be further optimized (e.g. by developing percentile selection heuristics based on a priori documented data distortions for VCF).
Global statistical power is not (in the range of workflows at hand) sensitive to registration. It is thus not a reliable metric for optimization, though regrettably, it may be the most prevalently used if results are only inspected at a higher level — and could bias analysis. This is exemplified by the positive effect of the Legacy template level seen in fig. 6a. In this particular case, optimizing for statistical power alone would give a misleading indication. We do not discount this measure entirely, however, as it is strongly sensitive to workflow parameter variations which we have excluded for the sake of brevity in this comparison, such as the registration interpolation method.
Overall we suggest that a VCF, SCF and Variance based comparison, coupled with visual inspection of a small number of omnibus statistic maps is a feasible and sufficient tool for benchmarking workflows, with MS usable as an additional sanity check. We recommend reuse of the presented data for workflow benchmarking, as it includes (a) multiple sources of variation (contrast, session, subjects), (b) functional activity with broad coverage but spatially distinct features, and (c) significant distortions due to implant properties — which are appropriate for testing workflow robustness. Owing to the RepSeP-compilant executable source code [31], which reproduces the statistics and figures in this document, our processing and data analysis is not only is fully transparent, but also reusable with further data and further workflows.
Conclusion
The SAMRI Generic workflow and Generic template presented in this article constitute a notable leap from the prevailing ad hoc paradigms of mouse brain imaging analysis. This is attested by an in-depth multivariate comparison of this novel design with a thoroughly documented Legacy pipeline representing alternative practices. For workflow comparison, we introduced metrics that can be used beyond the scope of this work for registration Quality Control. The optimized registration parameters of our workflow are accessible in the source code and transferable to any other workflows making use of the ANTs package. The software engineering choices in both the workflow and this article’s source code empower users to better verify, understand, remix, and reuse our work. Overall we believe that the insights summarized and technologies showcased herein will have a significant role in improving computational mouse brain imaging methodology.