This space contains public investigations and discussions from The Genomics team in Google Health. Our intended audience is the genomics community, and those within the machine learning community with a desire to learn about application within genomics.
- DeepVariant turns variant-calling into an image classification task. Here we explore what these pileup images look like and try to do the same classification task ourselves. We show easy and difficult examples, including multiallelics. By the end, we have a better intuition for how DeepVariant works.
- We explore three different training strategies to leverage whole-genome sequencing data to improve model performance for the specialized task of variant calling from whole-exome sequencing data: 1) jointly training with both WGS and WES data, 2) warmstarting from a pre-trained WGS model, and 3) including sequencing type as an input to the model.
- In this blog, we discuss how sequencing coverage involves trade-offs between cost and accuracy. We explore how computational methods that improve accuracy can also be understood as reducing cost. We compare current methods to historical accuracies. Finally, we explore the types of errors present at low and high coverages.
The Power of Building on an Accelerating Platform: How DeepVariant Uses Intel’s AVX-512 OptimizationsThe v0.7 release of DeepVariant featured a three-fold improvement in end-to-end speed and a corresponding decrease in cost relative to the previous version (v0.6). Much of this speed improvement comes by enabling DeepVariant to take advantage of new Intel® Advanced Vector eXtensions (AVX-512) instruction set.
- An example of how Nucleus, a library for reading, writing, and processing genomics data, can be used alongside TensorFlow for machine learning. We discuss two approaches for the problem of DNA sequencing error correction and implement the second approach in this Colab tutorial.
- We discuss the newly published use of PacBio Circular Consensus Sequencing (CCS) at human genome scale. DeepVariant trained for this data type achieves similar accuracy to available Illumina genomes, and is the only method to achieve competitive accuracy in Indel calling. Early access to this model is available now by request, and we expect general availability in our next DeepVariant release (v0.8).
- We investigate variant calling across a pedigree of mosquito genomes. Using rates of Mendelian violation, we assess pipelines developed to call variation in humans when applied to mosquito samples.
subscribe via RSS