Another new data release was just rolled out. Some highlights of this new release include the support of GRCh38/hg38 genome assembly, updated and additional data sources, as well as new data fields added. All changes in this data release are backwards-compatible.

Support variants on GRCh38/hg38 Genome Assembly

Previously, all variant annotations were aggregated according to their "_id" fields (HGVS names) based on GRCh37/hg19 reference genome assembly. You can still query for GRCh38/hg38 positions/intervals, but the returned variant hits are always on hg19. Now, you can now query for hg38 positions/intervals for variants on hg38 directly. We aggregated annotations from data sources where hg38 positions are provided. Currently, there are five of these including: dbSNP, dbNSFP, ClinVar, EVS and UniProt. To retrieve or query variants using hg38 coordinates, users could specify assembly=hg38 parameter in the URL. By default, still queries on hg19 assembly. Here is an example, these are the same variant on the hg19 and hg38 assemblies respectively:>T (default on hg19)>T?assembly=hg38 (on hg38)

and this query returns the variant hits within an hg38 interval:

Data Sources Updated

Three popular data sources, ClinVar, dbSNP and dbNSFP data were updated to their latest (same version for both hg19 and hg38 assembly):

last release new release # of variants
in new release
# of variants
in last release
ClinVar 201602 201605 131,383 134,176
dbSNP 144 147 145,132,257 153,037,251
dbNSFP 3.0c 3.1c 82,030,830 82,030,910

ClinVar, dbSNP and dbNSFP annotations are available under "clinvar" and "dbsnp", and "dbnsfp" subfields, respectively, for each annotated variant. aggregates annotations from ClinVar, dbSNP, dbNSFP and other 11 sources for each variant, so you can access them all in one request.

The total number of unique variants is now over 340M (340,102,225), compared to 334M previously. More details about the variant data we provide from are always available from our documentation. The programmatic access of this information is available from our metadata endpoint.

New fields for genomic positions:

Previously, the genomic position of a variant is provided as sub-fields under each data source field (e.g. clinvar.hg19, clinvar.hg38 and clinvar.chrom). We have now provided these fields ("hg19", "hg38" and "chrom") at the root of each variant annotation object, as the universal fields for genomic positions.

  • hg19 and hg38

    When provided, a variant object on GRCh37/hg19 genomic assembly should contain a "hg19" field. Likewise, a variant object on GRCh38/hg38 genomic assembly should contain a "hg38" field. Both fields include start and end positions:>A?fields=hg19 (query for hg19 field)>T?assembly=hg38&fields=hg38 (query for hg38 field)

  • chrom

    When available, the chromosome number is now provided as the "chrom" field at the root of each variant object:>A?fields=chrom

Note that this field is always a string (without "chr" part), even for chromosome "1"-"22".

Query RCV Accession Number and gene symbol directly:

Previously, we allowed users to query for matching variants directly using a "rsid". We have now included ClinVar "RCV Accession" and "gene symbol" as those special fields, which you can query directly without a need to specify which field to search on: (query for rsid directly) (query for clinvar RCV accession directly) (query for gene symbol directly)