2. Upload Own Variant Files¶
Clicking Upload sample at the top menu opens page to upload users’ own variant call file.
The variant call files need to follow the Variant Call Format (VCF), and only one file can be uploaded at a time.
However, VCF files containing more than one individual or sample are accepted.
First, users need to select correct version of human reference genome assembly (Genome build): hg19 or hg38.
If uncertain, it is recommended to check the header lines within the file (lines starting with #).
Next, the global ancestry (Ancestry) for the individual or sample to be analyzed, if known, is recommended to be set.
The same five continental groups as in the 1000 Genomes Project are used: African, East Asian, European, South Asian, and American.
If unknown or uncertain, it can be left as Unknown/Unspecified, and CGAR will estimate the global ancestry from variant file.
In the Variant file, local variant file can be selected for uploading to CGAR.
Only files ending with the extension of .vcf or .vcf.gz are accepted.
Note
Due to limitation on storage space, please contact us before uploading a file containing large number of individuals (more than 100 individuals).
Finally, when user clicks on the Upload Genome button (at the bottom of dialog), the variant call file gets transferred to the server and placed on a queue for annotation and analysis.
The required amount of time to finish the process varies, depending on the number of variants in the file.
Under normal circumstances, a file of whole genome containing 3 to 4 million variants gets processed in one hour.
Users will receive a notification email from us for each time a file is done processed and available for analysis.
2.1. Browse and track previous files¶
At the bottom of the page locates a table listing all variant files uploaded so far (including the one you have just uploaded).
In this table, users can see:
- The name of sample(s) in the variant file -
Genome label. - The name of the variant file -
Genome file name. - The version of human reference genome assembly -
Genome build. - The global ancestry specified at the time of uploading -
Genome ancestry. - The version of annotation -
Annotation. - The time and date when the file was first uploaded or done processing -
Uploaded.
When the file was just uploaded, but not yet finished processed and ready for analysis, it would first appear in the table with none as Annotation.
Also, the Genome label would simply show the number of samples within the file, e.g., (3 sample(s)) or (1 sample(s)).
Later, the table will be updated with as many samples as was in the file, with each sample as separate row carrying their own identifier (specified in original VCF file) as Genome label.
At this point, all samples will have the current version of annotation in Annotation, and Uploaded will reflect the date and time when the processing was finished.