This space contains public investigations and discussions from The Genomics team in Google Health. Our intended audience is the genomics community, and those within the machine learning community with a desire to learn about application within genomics.
- In this post, we summarize the improvements in accuracy and runtime over the years and highlight a few categories of changes that have led to these improvements.
Analyzing genomic data in families with deep learning → [External: Google Open Source Blog]This post dives into DeepTrio, which can jointly analyze a mother-father-child trio of samples. We discuss how we represent the trio data and how DeepTrio is trained. We give accuracy benchmarks for DeepTrio, showing that it has higher accuracy than single sample calling, especially at low sequence depths.
- We discuss a new channel in DeepVariant which encodes haplotype information in long-read data, and was released with DeepVariant v1.1. We review how haplotypes relate to variant calling, show examples improved by the channel, and quantify the accuracy improvement with PacBio HiFi reads.
Improving the Accuracy of Genomic Analysis with DeepVariant 1.0 → [External: Google AI Blog]This post covers the release of DeepVariant v1.0, which incorporates a large number of improvements for all sequencing types. DeepVariant v1.0 is an improved version of our submission to the PrecisionFDA v2 Truth Challenge, which achieved Best Overall accuracy for 3 of 4 instrument categories.
- DeepVariant turns variant-calling into an image classification task. Here we explore what these pileup images look like and try to do the same classification task ourselves. We show easy and difficult examples, including multiallelics. By the end, we have a better intuition for how DeepVariant works.
- We explore three different training strategies to leverage whole-genome sequencing data to improve model performance for the specialized task of variant calling from whole-exome sequencing data: 1) jointly training with both WGS and WES data, 2) warmstarting from a pre-trained WGS model, and 3) including sequencing type as an input to the model.
- In this blog, we discuss how sequencing coverage involves trade-offs between cost and accuracy. We explore how computational methods that improve accuracy can also be understood as reducing cost. We compare current methods to historical accuracies. Finally, we explore the types of errors present at low and high coverages.
The Power of Building on an Accelerating Platform: How DeepVariant Uses Intel’s AVX-512 OptimizationsThe v0.7 release of DeepVariant featured a three-fold improvement in end-to-end speed and a corresponding decrease in cost relative to the previous version (v0.6). Much of this speed improvement comes by enabling DeepVariant to take advantage of new Intel® Advanced Vector eXtensions (AVX-512) instruction set.
Analyzing 3024 rice genomes characterized by DeepVariant → [External: Google Cloud Blog]This post explores how to identify and analyze different rice genome mutations with DeepVariant. To do this, we performed a re-analysis of the Rice 3K dataset and have made the data publicly available as part of the Google Cloud Public Dataset Program.
- An example of how Nucleus, a library for reading, writing, and processing genomics data, can be used alongside TensorFlow for machine learning. We discuss two approaches for the problem of DNA sequencing error correction and implement the second approach in this Colab tutorial.
- We discuss the newly published use of PacBio Circular Consensus Sequencing (CCS) at human genome scale. DeepVariant trained for this data type achieves similar accuracy to available Illumina genomes, and is the only method to achieve competitive accuracy in Indel calling. Early access to this model is available now by request, and we expect general availability in our next DeepVariant release (v0.8).
- We investigate variant calling across a pedigree of mosquito genomes. Using rates of Mendelian violation, we assess pipelines developed to call variation in humans when applied to mosquito samples.
DeepVariant Accuracy Improvements for Genetic Datatypes → [External: Google AI Blog]This post covers the release of DeepVariant v0.6, which includes some major accuracy improvements. We describe how we train DeepVariant, and how we were able to improve DeepVariant's accuracy for two common sequencing scenarios, whole exome sequencing and polymerase chain reaction sequencing, simply by adding representative data into DeepVariant's training process.
DeepVariant: Highly Accurate Genomes With Deep Neural Networks → [External: Google AI Blog]We announce the open source release of DeepVariant, a deep learning technology to reconstruct the true genome sequence from HTS sequencer data with significantly greater accuracy than previous classical methods.
subscribe via RSS