TY - JOUR T1 - Escape Excel: a tool for preventing gene symbol and accession conversion errors JF - bioRxiv DO - 10.1101/103820 SP - 103820 AU - Eric A. Welsh AU - Paul A. Stewart AU - Brent M. Kuenzi Y1 - 2017/01/01 UR - http://biorxiv.org/content/early/2017/01/27/103820.abstract N2 - Background Microsoft Excel automatically converts certain gene symbols, database accessions, and other alphanumeric text and numbers into dates, scientific notation, and other numerical representations, which may lead to subsequent, irreversible corruption of the imported text. A recent survey of popular genomic literature estimates that one-fifth of all papers with supplementary data containing gene lists in Excel format suffer from this issue.Results Here, we present an open-source tool, Escape Excel, which prevents these erroneous conversions by generating an escaped text file that can be safely imported into Excel. Escape Excel is available in the Galaxy web environment and can be installed through the Galaxy ToolShed. Escape Excel is also available as a stand-alone, command line Perl script on GitHub (http://www.github.com/pstew/escape_excel). A Galaxy test server implementation is accessible at http://apostl.moffitt.org.Conclusions Escape Excel detects and escapes a wide variety of problematic text strings so that they are not erroneously converted into other representations upon importation into Excel. Examples of problematic strings include date-like strings, time-like strings, leading zeroes in front of numbers, and long numeric and alpha-numeric identifiers that should not be automatically converted into scientific notation. It is hoped that greater awareness of these potential data-corruption issues, together with diligent escaping of text files prior to importation into Excel, will help to reduce the amount of Excel-corrupted data in scientific analyses and publications. ER -