Abstract
Genome-wide CRISPR-Cas9 screen has been widely used to interrogate gene functions. However, the analysis remains challenging and rules to design better libraries beg further refinement. Here we present MAGeCK-NEST, which integrates protein-protein interaction (PPI), improves the inference accuracy when fewer guide-RNAs (sgRNAs) are available, and assesses screen qualities using information on PPI. MAGeCK-NEST also adopts a maximum-likelihood approach to remove sgRNA outliers, which are characterized with higher G-nucleotide counts, especially in regions distal from the PAM motif. Using MAGeCK-NEST, we found that choosing non-targeting sgRNAs as negative controls lead to strong bias, which can be mitigated by sgRNAs targeting the “safe harbor” regions. Custom-designed screens confirmed our findings, and further revealed that 19nt sgRNAs consistently gave the best signal-to-noise separation. Collectively, our method enabled robust calling of CRISPR screen hits and motivated the design of an improved genome-wide CRISPR screen library.