New release: MyGene.info Python client updated to v2.3.0

mygene mygene.py client python mygene.info newrelease sync_sulab

Before the year-end holiday season of 2015, we released a new version of MyGene.info Python client ("mygene" Python module). Here is a summary of the the changes that were made in the new release (v2.3.0), and we encourage all of our users to upgrade to this new version. The upgrade is as easy as one line of command:

pip install mygene -U  

To verify you have the latest version installed:

In [1]: import mygene

In [2]: mygene.__version__  
Out[2]: '2.3.0'

What's new and changed:

  • new get_fields method to search for matching field names.

    Wonder which field(s) provides associated KEGG pathways for your favorite gene? Using mv.get_fields('kegg') will give you the answer. This is essentially the same you can do from the "available fields" table of our documentation, but from the Python client directly.

In [3]: mg = mygene.MyGeneInfo()

In [4]: mygene.get_fields("kegg")
Out[4]:
{u'pathway.kegg': {u'indexed': False, u'type': u'object'},
 u'pathway.kegg.id': {u'indexed': True, u'type': u'string'},
 u'pathway.kegg.name': {u'indexed': True, u'type': u'string'}}

# you can then pass it to fields parameter
In [5]: mg.getgene('1017', fields='pathway.kegg')
# or
In [6]: mg.query('symbol:CDK2', fields='pathway.kegg')
  • Added a new fetch_all parameter for query method to retrieve large query.

    This feature enables users to stream all matching hits from a large query as a Python generator. Suppose you want to get back all human kinase genes, you can easily do it via this new fetch_all parameter:

# Normally, you can do the query like this:
In [7]: kinases = mg.query('name:kinase', species='human')
# this will return top 10 hits of total 1073 hits
Out [7]: <output omitted here>

# Previously, to get all 1073 hits, you can use 
# "size" and "skip" parameters for paging:
# this is first 1000
In [8]: kinases = mg.query('name:kinase', species='human', size=1000)
# this is the rest of 73
In [9]: kinases = mg.query('name:kinase',  species='human', size=1000, skip=1000)

# Although this works, but becomes harder for
# even larger queries. Using "fetch_all" parameter,
# you can handle this in an elegant way:
In [9]: kinases = mg.query('name:kinase', species='human', fetch_all=True)
In [10]: kinases
Out [10]" <generator object _fetch_all at 0x7fec027d2eb0>

# kinases is a Python generator, now you can
# loop through it to get all 1073 hits:
In [11]: for gene in kinases:
   ....:     print gene['_id'], gene['symbol']

  • getgene method now returns None if the input geneid does not match a known gene. Previously it raises an exception.

  • Two methods for batch queries, getgenes and querymany, now accept anything iterable as the input, like a list, a tuple, or a generator. Previously, they only accept a list or a tuple.

  • Finally, one major under-the-hook change is that we switched to requests Python module from httplib2 module for making underlying web service calls. While both are excellent modules, requests has gained popularity recently, so there's a good chance you'll already have it installed in your Python environment when you install mygene.

As always, you can find more info about our MyGene.info Python client here: