Magika
Magika is a tool to detect common file content types, using deep learning.
View project on GitHub

New Magika version available!

While this website is running on Magika 1.0, we have released a newer version of our model supporting 200+ content types. Our Python and Rust libraries support the newer model, as well as our CLI. This website will follow soon, but if you want the latest and greatest, go check those out

Magika leverages the power of cutting-edge deep learning to enhance the world of file type detection. It provides increased accuracy and support for a comprehensive range of content types, outperforming traditional tools with 99%+ average precision and recall.

Designed for efficiency, Magika runs quickly even on a single CPU. A similar model currently scans millions of files per second at Google (see blog post).

Demo
You can drop your files below to test out Magika. The processing happens entirely in your browser - the files won't be uploaded anywhere else.
Get Magika in your command line
You can start using Magika by installing it as a Python package: pip install magika
Then, you can run it by executing magika like so:
  $ magika examples/*

  code.asm: Assembly (code)
  code.py: Python source (code)
  doc.docx: Microsoft Word 2007+ document (document)
  doc.ini: INI configuration file (text)
  elf64.elf: ELF executable (executable)
  flac.flac: FLAC audio bitstream data (audio)
  image.bmp: BMP image data (image)
  java.class: Java compiled bytecode (executable)
  jpg.jpg: JPEG image data (image)
  pdf.pdf: PDF document (document)
  pe32.exe: PE executable (executable)
  png.png: PNG image data (image)
  README.md: Markdown document (text)
  tar.tar: POSIX tar archive (archive)
  webm.webm: WebM data (video)
  
Libraries!
You can use Magika in your Python code, or your JavaScript (in Node or client side). In fact, this page is using Magika's JavaScript library!
Paper
You can read our research paper on how the Magika model was trained and its performance on large datasets.
If you use Magika, please cite it like this:
@InProceedings{fratantonio25:magika,
  author = {Yanick Fratantonio and Luca Invernizzi and Loua Farah and Kurt Thomas and Marina Zhang and Ange Albertini and Francois Galilee and Giancarlo Metitieri and Julien Cretin and Alexandre Petit-Bianco and David Tao and Elie Bursztein},
  title = { {Magika: AI-Powered Content-Type Detection} },
  booktitle = {Proceedings of the International Conference on Software Engineering (ICSE)},
  month = {April},
  year = {2025}
}
    
Need more info? See our README on GitHub!
Model card

Model Details

Overview

Magika is a content type detection tool powered by deep learning. It is accurate (99%+ average accuracy on our test dataset across 120+ content types), reasonably fast even on a single CPU (inference time of the underlying model: 5/6ms), and reasonably small in size (the core model is ~1MB). It offers a significant accuracy boost with respect to existing tools.

Version

name: v1.0
date: 2024/02/16

Owners

Magika team, magika-dev@google.com

Licenses

  • Apache-2.0

References

Citations

  • https://github.com/google/magika

Considerations

Use Cases

  • This model classifies files into a predefined sets of content types.

Limitations

  • This model is trained to output a single content type, so polyglot files will not be mapped to two or more categories.

Quantitative Analysis

Performance Metrics
Name Value
precision, ai 100.00%
precision, apk 99.24%
precision, appleplist 99.94%
precision, asm 99.53%
precision, asp 99.45%
precision, batch 98.52%
precision, bmp 99.98%
precision, bzip 100.00%
precision, c 99.28%
precision, cab 99.99%
precision, cat 100.00%
precision, chm 100.00%
precision, coff 99.95%
precision, cpl 100.00%
precision, crx 99.98%
precision, cs 99.69%
precision, css 99.61%
precision, csv 98.94%
precision, deb 99.99%
precision, dex 100.00%
precision, dll 100.00%
precision, dmg 99.98%
precision, doc 99.35%
precision, docx 99.66%
precision, dylib 100.00%
precision, elf 99.99%
precision, emf 100.00%
precision, eml 99.84%
precision, epub 99.92%
precision, exe 100.00%
precision, flac 100.00%
precision, gif 100.00%
precision, go 99.87%
precision, gzip 100.00%
precision, hlp 100.00%
precision, html 96.66%
precision, ico 99.96%
precision, ini 98.81%
precision, internetshortcut 99.98%
precision, iso 99.90%
precision, jar 98.89%
precision, java 99.30%
precision, javabytecode 100.00%
precision, javascript 99.03%
precision, jpeg 100.00%
precision, json 99.44%
precision, ko 99.99%
precision, latex 99.86%
precision, lisp 99.90%
precision, lnk 100.00%
precision, m3u 100.00%
precision, macho 100.00%
precision, makefile 99.83%
precision, markdown 97.12%
precision, mht 99.95%
precision, mp3 99.98%
precision, mp4 100.00%
precision, mscompress 100.00%
precision, msi 99.88%
precision, mui 100.00%
precision, mum 99.99%
precision, ocx 100.00%
precision, odex 99.98%
precision, odp 99.94%
precision, ods 99.87%
precision, odt 99.79%
precision, ogg 99.99%
precision, outlook 99.42%
precision, pcap 99.98%
precision, pdf 100.00%
precision, pem 99.95%
precision, perl 99.45%
precision, php 98.87%
precision, png 99.99%
precision, postscript 99.99%
precision, powershell 99.15%
precision, ppt 98.67%
precision, pptx 99.04%
precision, python 99.16%
precision, pythonbytecode 100.00%
precision, randombytes 99.67%
precision, rar 100.00%
precision, rdf 99.86%
precision, rpm 99.99%
precision, rst 96.70%
precision, rtf 99.91%
precision, ruby 99.59%
precision, rust 99.86%
precision, scala 99.85%
precision, scr 100.00%
precision, sevenzip 100.00%
precision, shell 97.12%
precision, smali 100.00%
precision, so 99.99%
precision, sql 99.52%
precision, squashfs 100.00%
precision, svg 99.92%
precision, swf 100.00%
precision, symlinktext 97.10%
precision, sys 100.00%
precision, tar 99.97%
precision, tga 100.00%
precision, tiff 99.95%
precision, torrent 100.00%
precision, ttf 100.00%
precision, txt 94.21%
precision, vba 99.69%
precision, wav 99.97%
precision, webm 100.00%
precision, webp 100.00%
precision, winregistry 99.97%
precision, wmf 100.00%
precision, xar 100.00%
precision, xls 98.93%
precision, xlsb 99.61%
precision, xlsx 99.61%
precision, xml 98.40%
precision, xpi 99.58%
precision, xz 100.00%
precision, yaml 99.03%
precision, zip 97.00%
precision, zlibstream 99.99%
recall, ai 100.00%
recall, apk 98.75%
recall, appleplist 99.92%
recall, asm 99.42%
recall, asp 98.75%
recall, batch 96.45%
recall, bmp 99.98%
recall, bzip 100.00%
recall, c 99.29%
recall, cab 100.00%
recall, cat 100.00%
recall, chm 100.00%
recall, coff 99.93%
recall, cpl 100.00%
recall, crx 99.79%
recall, cs 99.65%
recall, css 98.85%
recall, csv 98.50%
recall, deb 100.00%
recall, dex 99.99%
recall, dll 100.00%
recall, dmg 100.00%
recall, doc 98.46%
recall, docx 99.40%
recall, dylib 100.00%
recall, elf 99.97%
recall, emf 99.99%
recall, eml 99.90%
recall, epub 99.92%
recall, exe 99.99%
recall, flac 100.00%
recall, gif 99.97%
recall, go 99.94%
recall, gzip 100.00%
recall, hlp 100.00%
recall, html 97.78%
recall, ico 100.00%
recall, ini 98.16%
recall, internetshortcut 99.93%
recall, iso 99.49%
recall, jar 97.57%
recall, java 99.61%
recall, javabytecode 100.00%
recall, javascript 99.13%
recall, jpeg 100.00%
recall, json 99.74%
recall, ko 100.00%
recall, latex 99.21%
recall, lisp 99.79%
recall, lnk 100.00%
recall, m3u 100.00%
recall, macho 100.00%
recall, makefile 99.83%
recall, markdown 93.86%
recall, mht 99.86%
recall, mp3 100.00%
recall, mp4 100.00%
recall, mscompress 100.00%
recall, msi 99.36%
recall, mui 100.00%
recall, mum 100.00%
recall, ocx 100.00%
recall, odex 100.00%
recall, odp 99.62%
recall, ods 99.68%
recall, odt 99.83%
recall, ogg 99.99%
recall, outlook 99.83%
recall, pcap 100.00%
recall, pdf 100.00%
recall, pem 99.71%
recall, perl 99.36%
recall, php 98.65%
recall, png 100.00%
recall, postscript 100.00%
recall, powershell 99.40%
recall, ppt 99.32%
recall, pptx 99.64%
recall, python 99.36%
recall, pythonbytecode 100.00%
recall, randombytes 99.63%
recall, rar 100.00%
recall, rdf 99.91%
recall, rpm 100.00%
recall, rst 97.53%
recall, rtf 99.97%
recall, ruby 99.61%
recall, rust 99.68%
recall, scala 99.72%
recall, scr 99.99%
recall, sevenzip 100.00%
recall, shell 98.20%
recall, smali 99.99%
recall, so 99.99%
recall, sql 99.45%
recall, squashfs 100.00%
recall, svg 99.73%
recall, swf 100.00%
recall, symlinktext 100.00%
recall, sys 100.00%
recall, tar 100.00%
recall, tga 100.00%
recall, tiff 100.00%
recall, torrent 99.99%
recall, ttf 100.00%
recall, txt 92.12%
recall, vba 98.37%
recall, wav 100.00%
recall, webm 100.00%
recall, webp 100.00%
recall, winregistry 99.46%
recall, wmf 100.00%
recall, xar 100.00%
recall, xls 99.30%
recall, xlsb 99.33%
recall, xlsx 99.58%
recall, xml 99.30%
recall, xpi 99.32%
recall, xz 100.00%
recall, yaml 99.24%
recall, zip 99.52%
recall, zlibstream 100.00%