NIAGADS
  • VCPA
  • Introduction
  • Step 1: Set up the Amazon Web Services (AWS) environment
    • 1.1 Create AWS account
    • 1.2 Configure your computing environment and login to AWS
    • 1.3 Setup a S3 bucket (simple storage solution for AWS) for hosting sequencing data
    • 1.4 Install AWS command line software for accessing S3 bucket via command line interface
    • 1.5 Install StarCluster for AWS instance provisioning (optional)
  • Step 2: Create your tracking database instance
    • Option 1: Setup sample tracking database using Public AMI (recommended)
    • Option 2: Setup sample tracking database using Docker
  • Step 3: Configure your project information in the tracking database
    • 3.1 List all projects in the tracking database
    • 3.2 Create the project in the tracking database
  • Step 4: Upload sequencing data to your S3 bucket
  • Step 5: Configure your samples information in the tracking database
    • 5.1 Input the sample information to the tracking database
    • 5.2 Populate the tracking database with the S3 paths for the samples to be processed
    • 5.3 Populate the tracking database with the designated result folder for each sample to be processed
    • 5.4 Input PCR protocol information into the tracking database
    • 5.5 Add the capture kit information (WES sample only) into the tracking database
    • 5.6 Generate an ID to represent the capture kit information (WES sample only)
  • Step 6: Submit a job to process one whole genome (WGS) / whole exome (WES) sample
    • 6.1 Update vcpa-pipeline bitbucket contents
    • 6.2 Choose which workflow to use
    • 6.3 Enter your AWS credentials into the workflow script
    • 6.4 Launch Amazon EC2 Spot Instances via starcluster
  • Step 7: Review quality metrics of processed data
  • Step 8: Generating Project-level VCF via joint genotyping
  • Optional: Change software versions and dependencies of the VCPA workflow
Powered by GitBook
On this page

Step 7: Review quality metrics of processed data

Previous6.4 Launch Amazon EC2 Spot Instances via starclusterNextStep 8: Generating Project-level VCF via joint genotyping

Last updated 6 years ago

After the sample get processed by VCPA, users can review the sample level stage-by-stage quality metrics of the VCPA pipeline via the tracking database.

The tracking database can be accessed using the below endpoints (where IP is the public IP address of the tracking database)

1) To view the list of projects:

2) To view the samples information with each of the projects:

(where * is the project_id)

VCPA is divided into 4 stages at the per sample level (see overview figure in the Introduction session for details). For each stage of VCPA, the quality metrics can be accessed respectively as follows:

3) Stage0 - =*

4) Stage1 - =*

5) Stage2a - =*

6) Stage2b -

(where * is the project_id)

Notable metrics:

Sequencing coverage – Sequencing coverage describes the average number of reads that ‘align’ or ‘cover’ known reference bases. This coverage level helps determine whether variant discovery can be made with a certain degree of confidence at a specific base location. Coverage metrics are captured at stage1 of VCPA.

Average coverage equation - The WGS depth of coverage calculation is performed using sambamba. We extract the read count and coverage across several percentages. The average depth of coverage for WGS is calculated as: the average chromosome read count * read size summed, divided by number of non-N nucleotides.

Transition to Transversion (Ti/Tv) Ratio – Ratio of the number of transitions (changes from A <-> G and C <-> T) to the number of transversions (changes from A <-> C, A <-> T, G <-> C or G <-> T) for a pair of sequences. It is one of the metrics to evaluate the quality for the callset. Ti/Tv ratio are captured at stage2b of VCPA.

GATK VariantEval is used to calculate the Ti/Tv ratio. Database also capture number of known and novel Ti and Tv SNPs.

http://IP/v1/projects
http://IP/v1/sample/details?project_id=*
http://IP/v1/stats/stage0/project_id
http://IP/v1/stats/stage1/project_id
http://IP/v1/stats/stage2a/project_id
http://IP/v1/stats/stage2b/project_id=*