Variant annotation data¶

Data sources¶

We currently obtain variant annotation data from several data resources and keep them up-to-date, so that you don’t have to do it:

Total hg19 variants loaded: N/A

Source	version	# of variants	key name*
dbNSFP	-	-	dbnsfp
dbSNP	-	-	dbsnp
ClinVar	-	-	clinvar
EVS	-	-	evs
CADD	-	-	cadd
MutDB	-	-	mutdb
GWAS Catalog	-	-	gwassnps
COSMIC	-	-	cosmic
DOCM	-	-	docm
SNPedia	-	-	snpedia
EMVClass	-	-	emv
Scripps Wellderly	-	-	wellderly
EXAC	-	-	exac
GRASP	-	-	grasp
UniProt	-	-	uniprot
CIViC	-	-	civic
Cancer Genome Interpreter	-	-	cgi
genome Aggregation Database	-	-	gnomad_genome
genome Aggregation Database	-	-	gnomad_exome
Geno2MP	-	-	geno2mp

Total hg38 variants loaded: N/A

Source	version	# of variants	key name*
dbNSFP	-	-	dbnsfp
dbSNP	-	-	dbsnp
ClinVar	-	-	clinvar
EVS	-	-	evs
UniProt	-	-	uniprot
genome Aggregation Database	-	-	gnomad_genome
genome Aggregation Database	-	-	gnomad_exome

* key name: this is the key for the specific annotation data in a variant object.

The most updated information can be accessed here for hg19 and here for hg38.

Note

Each data source may have its own usage restrictions (e.g. CADD data are free for non-commercial use only). Please refer to the data source pages above for their specific restrictions.

Variant object¶

Variant annotation data are both stored and returned as a variant object, which is essentially a collection of fields (attributes) and their values:

{
  "_id": "chr1:g.35367G>A",
  "_version": 2,
  "cadd": {
    "alt": "A",
    "annotype": "NonCodingTranscript",
    "chrom": 1,
    "gene": {
      "cds": {
        "cdna_pos": 476,
        "rel_cdna_pos": 0.4
      },
      "feature_id": "ENST00000417324",
      "gene_id": "ENSG00000237613",
    },
    "ref": "G",
    "type": "SNV"
  },
  "dbnsfp": {
    "aa": {
      "aapos_sift": "ENSP00000409362:P44L",
      "alt": "L",
      "codonpos": 2,
      "pos": 44,
      "ref": "P",
      "refcodon": "CCG"
    },
    "alt": "A",
    "ancestral_allele": "G",
    "chrom": "1",
    "ensembl": {
      "geneid": "ENSG00000237613",
      "transcriptid": "ENST00000417324"
    },
    "genename": "FAM138A",
    "hg19": {
      "end": 35367,
      "start": 35367
    }
  }
}

The example above omits many of the available fields. For a full example, check out this example variant, or try the interactive API page.

_id field¶

Each individual variant object contains an “_id” field as the primary key. We utilize the recommended nomenclature from Human Genome Variation Society to define the “_id” field in MyVariant.info. Specifically, we use HGVS’s genomic reference sequence notation based on the current reference genome assembly (e.g. hg19 for human). The followings are brief representations of major types of genetic variants. More examples could be found at HVGS recommendations for the description of DNA sequence variants page.

Note

The default reference genome assembly is always human hg19 in MyVariant.info, so we only use “chr??” to represent the reference genomic sequence in “_id” field. The valid chromosomes representations are chr1, chr2, …, chr22, chrX, chrY and chrMT. Do not use chr23 for chrX, chr24 for chrY, or chrM for chrMT.

SNV example:
```
chr1:g.35366C>T
```
The above _id represents a C to T SNV on chromosome 1, genomic position 35366.
Insertion example:
```
chr2:g.17142_17143insA
```
The above _id represents that an A is inserted between genomic position 17142 and 17143 on chromosome 2.
Deletion example:
```
chrMT:g.8271_8279del
```
The above _id represents that a nine nucleotides deletion between genomic position 8271 and 8279 on chromosome MT. Note that we don’t include the deleted sequence in the _id field in this case.
Deletion/Insertion example:
```
chrX:g.14112_14117delinsTG
```
The above _id represents that six nucleotides between genomic position 14112 and 14117 are replaced by TG.

_score field¶

You will often see a “_score” field in the returned variant object, which is the internal score representing how well the query matches the returned variant object. It probably does not mean much in variant annotation service when only one variant object is returned. In variant query service, by default, the returned variant hits are sorted by the scores in descending order.

_version field¶

Sometime, you will see a “_version” field in the returned variant object, e.g. from the v1/variant endpoint. This field is basically for our internal information purpose, not very useful to the end users. You can just ignore it.

But for those who are curious, here is the explanation. The value of this “_version” field can be a small integer like 1, 2, 5 etc. The number indicates the version history of this particular variant object (i.e. how many times this object was updated). Because each variant object is updated independently and incrementally only when the updates to that particular variant are available, the “_version” values differ across variant objects. Of course, from time to time, when we need to make a full-data release (with some huge updates), every variant object will be re-created and their “_version” values will all be reset to 1.

Please also note that we don’t keep any older versions of a variant object, the one returned from the API request is always the latest one we have. The “_version” field just indicates how many times it was updated in the past (since our last full data release).

Available fields¶

The table below lists all of the possible fields that could be in a variant object, as well as all of their parents (for nested fields). If the field is indexed, it may also be directly queried, e.g.

q=dbnsfp.polyphen2.hdiv.score:>0.99

All fields can be used with _exists_ or _missing_ filters, e.g.

q=_exists_:dbsnp AND _exists_:cosmic
q=_missing_:wellderly

or as inputs to the fields parameter, e.g.

q=_exists_:dbsnp&fields=dbsnp.rsid,dbsnp.vartype

Field	Type	Searched by default	hg19	hg38	Notes

Variant annotation data¶

Data sources¶

Variant object¶

_id field¶

_score field¶

_version field¶

Available fields¶

Table of Contents

Previous topic

Next topic

This Page