This space contains public investigations and discussions from The Genomics team in Google Health. Our intended audience is the genomics community, and those within the machine learning community with a desire to learn about application within genomics.
DeepNull: an open-source method to improve the discovery power of genetic association studies → [External: Google Open Source Blog]In this post, we describe a new method, DeepNull, which models the complex relationship between covariate effects on phenotypes to improve Genome-wide association studies (GWAS) results. We discuss why correcting for the relationships is important, and how DeepNull is able to better do so.
Advancing genomics to better understand and treat disease → [External: The Keyword | Google]At Google Health, we’re applying our technology and expertise to the field of genomics. Here are recent research and industry developments we’ve made to help quickly identify genetic disease and foster the equity of genomic tests across ancestries. This includes an exciting new partnership with Pacific Biosciences to further advance genomic technologies in research and the clinic.
Improving Genomic Discovery with Machine Learning → [External: Google AI Blog]In this post, we demonstrate how using ML models to classify medical imaging data can be used to improve GWAS. We describe how models can be trained for phenotypes to generate trait predictions, how these predictions are used to identify novel genetic associations, and that the novel associations discovered improve PRS accuracy.
- In this post, we summarize the improvements in accuracy and runtime over the years and highlight a few categories of changes that have led to these improvements.
Analyzing genomic data in families with deep learning → [External: Google Open Source Blog]This post dives into DeepTrio, which can jointly analyze a mother-father-child trio of samples. We discuss how we represent the trio data and how DeepTrio is trained. We give accuracy benchmarks for DeepTrio, showing that it has higher accuracy than single sample calling, especially at low sequence depths.
- We discuss a new channel in DeepVariant which encodes haplotype information in long-read data, and was released with DeepVariant v1.1. We review how haplotypes relate to variant calling, show examples improved by the channel, and quantify the accuracy improvement with PacBio HiFi reads.
Improving the Accuracy of Genomic Analysis with DeepVariant 1.0 → [External: Google AI Blog]This post covers the release of DeepVariant v1.0, which incorporates a large number of improvements for all sequencing types. DeepVariant v1.0 is an improved version of our submission to the PrecisionFDA v2 Truth Challenge, which achieved Best Overall accuracy for 3 of 4 instrument categories.
- DeepVariant turns variant-calling into an image classification task. Here we explore what these pileup images look like and try to do the same classification task ourselves. We show easy and difficult examples, including multiallelics. By the end, we have a better intuition for how DeepVariant works.
- We explore three different training strategies to leverage whole-genome sequencing data to improve model performance for the specialized task of variant calling from whole-exome sequencing data: 1) jointly training with both WGS and WES data, 2) warmstarting from a pre-trained WGS model, and 3) including sequencing type as an input to the model.
- In this blog, we discuss how sequencing coverage involves trade-offs between cost and accuracy. We explore how computational methods that improve accuracy can also be understood as reducing cost. We compare current methods to historical accuracies. Finally, we explore the types of errors present at low and high coverages.
The Power of Building on an Accelerating Platform: How DeepVariant Uses Intel’s AVX-512 OptimizationsThe v0.7 release of DeepVariant featured a three-fold improvement in end-to-end speed and a corresponding decrease in cost relative to the previous version (v0.6). Much of this speed improvement comes by enabling DeepVariant to take advantage of new Intel® Advanced Vector eXtensions (AVX-512) instruction set.
Analyzing 3024 rice genomes characterized by DeepVariant → [External: Google Cloud Blog]This post explores how to identify and analyze different rice genome mutations with DeepVariant. To do this, we performed a re-analysis of the Rice 3K dataset and have made the data publicly available as part of the Google Cloud Public Dataset Program.
- An example of how Nucleus, a library for reading, writing, and processing genomics data, can be used alongside TensorFlow for machine learning. We discuss two approaches for the problem of DNA sequencing error correction and implement the second approach in this Colab tutorial.
- We discuss the newly published use of PacBio Circular Consensus Sequencing (CCS) at human genome scale. DeepVariant trained for this data type achieves similar accuracy to available Illumina genomes, and is the only method to achieve competitive accuracy in Indel calling. Early access to this model is available now by request, and we expect general availability in our next DeepVariant release (v0.8).
- We investigate variant calling across a pedigree of mosquito genomes. Using rates of Mendelian violation, we assess pipelines developed to call variation in humans when applied to mosquito samples.
DeepVariant Accuracy Improvements for Genetic Datatypes → [External: Google AI Blog]This post covers the release of DeepVariant v0.6, which includes some major accuracy improvements. We describe how we train DeepVariant, and how we were able to improve DeepVariant's accuracy for two common sequencing scenarios, whole exome sequencing and polymerase chain reaction sequencing, simply by adding representative data into DeepVariant's training process.
DeepVariant: Highly Accurate Genomes With Deep Neural Networks → [External: Google AI Blog]We announce the open source release of DeepVariant, a deep learning technology to reconstruct the true genome sequence from HTS sequencer data with significantly greater accuracy than previous classical methods.