Phosphorus Research: CtsCNV, A Copy Number Variant Detection Method for Clinical Targeted Sequencing Data

Last month, Phosphorus attended the American Society of Human Genetics (ASHG) 2018 Annual Meeting in San Diego, California. The ASHG Meeting is an important event for Phosphorus, allowing us to see what is new in the wider world of genetics and to get a better sense of those whom we are helping with our products and services. Phosphorus was proud to present its poster, entitled “CtsCNV: A Copy Number Variant Detection Method for Clinical Targeted Sequencing Data.” The poster provided details on our attempt to develop an algorithm using clinical NGS data to better detect copy number variants (CNVs). Select text from the poster is below:

Methods

  1. Data
    We downloaded 90 low coverage 1000 Genome Project WES data. Targeted panels were customized in the Roche/NimbleGen platform. Genes relevant to disease phenotype were included based on relationships described in Online Mendelian Inheritance in Man (OMIM), Human Phenotype Ontology (HPO), GeneReviews, and variants reported in ClinVar NIH database were also included. Affymetrix Microarrays were used for validation. CNV were detected with XHMM, CODEX, EXCAVATOR2, CONVADING and cstCNV.
  2. ctsCNV algorithm
    Before calling CNV, sequencing data was first processed by our BioQC pipeline to ensure enough sequencing quality. In general, we require ~100x of mean sample coverage to confidently call the CNV events. Our cstCNV takes either BAM or genomecov.bed data format and bins the targeted regions into 100-bp sub intervals, using the mean value of each interval sequencing depth as input. A key point in cstCNV is to include several normalization procedures to remove variation in depth due to non-biological noise and can be readily used in clinical targeted sequencing (CTS) data. This procedure corrects for sample variability, batch effects, bias in GC content in the sequences, and other technical biases. Loci with extreme read depth (high or low) or high variance among samples are removed from analysis. After normalization (spline normalization, z-score, PCA), copy number estimates per interval are computed by comparing each sample’s normalized depth per interval to the median normalized depth within a batch. For PCA, we remove top components because majority of variation is not biological as a function of the number of samples. A z-score is computed on the difference between sub interval depth and median of loci across samples. CNVs are called by running the widely used Circular Binary Segmentation algorithm2 on the interval estimates along with a permutation based significance test. All bioinformatics algorithms were implemented within ElementsTM platform, and will be freely available as API.

Results

  1. ctsCNV outperforms other callers on targeted sequencing data
    First, using a custom Roche NimbleGen targeted panel, 372 previously characterized DNA samples were sequenced at different depths (median ~400x). These samples included 81 known CNVs. The performance of ctsCNV was compared to several commonly used callers including CODEX, XHMM, EXCAVATOR2 and CONVADING. CtsCNV achieved the highest sensitivity with a good precision (Table 1).
  2. ctsCNV shows high accuracy on 1000 Genome WES data
    To ensure the analysis was not biased to internal lab conditions, all callers were also tested on 90 lower depth whole exome sequencing data from the 1000 Genomes project. We used three high-resolution CNV datasets (by Hapmap, by McCarroll et al. and by Conred et al.) to measure accuracy. CNV result type is defined as ‘ALL’, ‘COMMON’ and ‘RARE’ based on frequency. The cstCNV achieved the highest recall and precision as well (Figure.1).
  3. ctsCNV robustly adapts to clinical CNV testing
    We have optimized the algorithm to real world clinical data. We are able to accurately call CNV in 1) long CNVs (up to 3.6Mbp Y chromosome micro-deletion, median depth ~250x); 2) in low coverage sequencing data (median depth ~120x); 3) in small batch (Table 2, Figure.2).

Discussions

Clinical genetic diagnosis of CNV is currently challenging with NGS data, due to uneven coverage and GC content, non-biological biases such as batch effect, and small batch size and low read depths because of cost consideration. Our ctsCNV method allows for the detection of exon CNV with excellent sensitivity and specificity as compared with other four CNV callers in clinical panel sequencing as well as WES data. Furthermore, ctsCNV shows great performance on clinical data of small batch and low coverage. In addition, our method can be readily adapted for the challenging Y chromosome micro-deletion detection.

Conclusion

We have developed a robust CNV calling method for clinical NGS data, which outperform many CNV callers on accuracy and is applicable to small batch and low coverage data.

--

--

--

Phosphorus Diagnostics is using genomics to improve human health. We offer the most comprehensive, actionable #genetic test for disease prevention.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Own — Nurture — Earn — Fuse — Repeat

“Science is a key enabler for product innovations”

fMRI clusterf-…issues: What’s it all about??

Nuclear Medicine Market — an overview | New Technology

Nuclear Medicine Market - an overview | New Technology

Vegan Hunting

Science vs. Technology

READ/DOWNLOAD#^ Compositional and Failure Analysis of Polymers: A Practical Approach FULL BOOK PDF…

The Impact of Blue Light On the Photosynthetic Rate in Korean Evodia

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Phosphorus Diagnostics

Phosphorus Diagnostics

Phosphorus Diagnostics is using genomics to improve human health. We offer the most comprehensive, actionable #genetic test for disease prevention.

More from Medium

Local and Global Scope of Variable

Learning Dijkstra’s Algorithm

Encoding Using Greedy Algorithm: Huffman Coding

Data Assurance