VAPr - A python package for efficient Variant Analysis and Prioritization

One of our pride points for being able to pool, standardize, and share gene, variant, and other “BioThings” annotation data as a service, is that our service is fast! The reason that MyGene.info and MyVariant.info are made with speed in mind is that we want them to be useful to bioinformaticians and tool/resource developers alike! How can we tell if we’ve successfully provided a useful service?

One measure we LOVE, is when users build something useful or amazing with our service--especially when such users includes a former developer for a BioThings API (Adam Mark) and is using that expertise to build even more amazing tools. VAPr comes from the Center for Computational Biology & Bioinformatics (CCBB) at UCSD. Dr. Kathleen Fisch, the executive director for the center and former Su Lab postdoc-extraordinaire, was kind enough to answer our questions.

In one tweet or less, introduce us to VAPr: VAPr enables population-scale variant annotation, filtering and prioritization by leveraging MyVariant.info and MongoDB.

What was the original intent behind VAPr (how did VAPr come about, how was the collaborative effort started)? We developed VAPr within the UCSD Center for Computational Biology & Bioinformatics to support the population-scale whole genome sequencing projects we analyze routinely. We collaborate on whole genome sequencing projects ranging from 10s to 1000s of samples utilizing our in-house cloud-optimized NGS pipeline Cirrus-NGS https://github.com/ucsd-ccbb/cirrus-ngs for variant calling. We needed an efficient, scalable tool to perform the annotation and enable iterative filtering and prioritization for each of these projects, as the biological goals of each project are unique. We were also interested in implementing a solution that allowed non-computational collaborators to interactively filter their own variants, which VAPr allows them to do by using a MongoDB GUI to query variants.

How has VAPr since improved (key improvements, not just GitHub commits)? We have implemented additional pre-built variant filters the can be implemented through a Jupyter Notebook and we have added export capabilities for MAF files to better integrate with MAFtools and other software requiring a MAF file, such as MutSigCV.

Who is currently the intended audience for VAPr? Computational biologists and bioinformaticians are the intended initial users of VAPr. Once VAPr has been run to annotate and store the variants in the database, the end user of VAPr is intended to be biologists or clinicians wanting to interactively filter and prioritize variants through a MongoDB GUI.

How does VAPr use MyGene.info or MyVariant.info services? VAPr uses the MyVariant.info API to annotate variants that are then stored in a local MongoDB.

What are some of VAPr’s successes (news releases, papers published)? We routinely use VAPr within CCBB on collaborative projects, many of which are in various stages of manuscript preparation, review and press.

What improvements are planned for VAPr? We are currently implementing VAPr within our Cirrus-NGS variant calling pipelines to provide automated annotation and database creation for whole genome sequencing projects, as well as adding features to support annotation of RNA variants.

You can read more about VAPr from their publication:
[Efficient population-scale variant analysis and prioritization with VAPr
Amanda Birmingham Adam M Mark Carlo Mazzaferro Guorong Xu Kathleen M Fisch
Bioinformatics, Volume 34, Issue 16, 15 August 2018, Pages 2843–2845] (https://doi.org/10.1093/bioinformatics/bty192)

Additional information on VAPr can be found at:
https://vapr.readthedocs.io/en/latest/
https://github.com/ucsd-ccbb/VAPr