Modern genomics projects are generating millions of variant calls that must be annotated for predicted functional consequences at the level of gene expression and protein function. Many of these variants are of interest owing to their potential clinical significance. Unfortunately, state-of-the-art methods do not always agree on downstream effects for any given variant. Here we present a readily extensible python framework (PyVar) for comparing the output of variant annotator methods in order to aid the research community in quickly assessing differences between methods and benchmarking new methods as they are developed. We also apply our framework to assess the annotation performance of ANNOVAR, VEP, and SnpEff when annotating 81 million variants from the "1000 Genomes Project" against both RefSeq and Ensembl human transcript sets.