Abstract
Background Next generation sequencing is widely used in cancer to profile tumors and detect variants. Most somatic variant callers used in these pipelines identify variants at the lowest possible granularity – single nucleotide variants (SNVs). As a result, multiple adjacent SNVs are called individually instead of as a multi-nucleotide variant (MNV). The problem with this level of granularity is that the amino acid change from the individual SNVs within a codon could be different from the amino acid change based on the MNV that results from combining the SNVs. Most variant annotation tools do not account for this, leading to incorrect conclusions about the downstream effects of the variants.
Method Here, we used Variant Call Files (VCFs) from the TCGA Mutect2 caller, and developed a solution to merge SNVs to MNVs. Our custom script takes the phasing information from the SNV VCFs and based on a gene model, determines if SNVs are at the same codon and need to be merged into a MNV prior to variant annotation.
Results We analyzed 10,383 VCFs from TCGA and found 12,141 MNVs that were incorrectly annotated. Strikingly, the analysis of seven commonly mutated genes from 178 studies from cBioPortal revealed that MNVs were consistently missed in 20 of these studies, while they were correctly annotated in 15 more recent studies. The best and most common example of MNVs was found at the BRAF V600 locus, where several public datasets reported separate BRAF V600E and BRAF V600M variants, instead of a single merged V600K variant.
Conclusion While some datasets merged MNVs correctly, many public datasets have not been corrected for this problem. As a best practice for variant calling, we recommend that MNVs be accounted for in NGS processing pipelines, thus improving analyses on the impact of somatic variants in cancer genomics.
Competing Interest Statement
The research for this paper was funded by Bristol Myers Squibb. Rafael Aldana and Zhipan Li are employees of Sentieon, Inc. Sjoerd van Hagen and Sander Y.A. Rodenburg are employees of The Hyve. Xiaozhong Qian is an employee of Daichi Sankyo, Inc. The other authors have no conflicts of interest to declare.