Abstract
Platforms and institutions that support MRI data sharing need to ensure that identifiable facial features are not present in shared images. Currently, this assessment requires manual effect as no auto-mated tools exist that can efficiently and accurately detect if an image has been “defaced”. The scarcity of publicly available data with pre-served facial features, as well as the meager incentives to create such a cohort privately, have averted the development of face-detection models. Here, we introduce a framework to detect whether an input MRI brain scan has been defaced, with the ultimate goal of streamlining it within the submission protocols of MRI data archiving and sharing platforms. We present a binary (defaced/”nondefaced”) classifier based on a custom convolutional neural network architecture. We train the model on 980 de-faced MRI scans from 36 different studies that are publicly available at OpenNeuro.org. To overcome the unavailability of nondefaced examples, we augment the dataset by inpainting synthetic faces into each training image. We show the adequacy of such a data augmentation in a cross-validation evaluation. We demonstrate the performance estimated with cross-validation matches that of an evaluation on a held-out dataset (N =581) preserving real faces, and obtain accuracy/sensitivity/speci-ficity scores of 0.978/0.983/0.972, respectively. Data augmentations are key to boosting the performance of models bounded by limited sample sizes and insufficient diversity. Our model contributes towards developing classifiers with ∼100% sensitivity detecting faces, which is crucial to ensure that no identifiable data are inadvertently made public.
Competing Interest Statement
The authors have declared no competing interest.