Magika
Magika is a tool to detect common file content types, using deep learning.
View project on GitHubNew Magika version available!
While this website is running on Magika 1.0, we have released a newer version of our model supporting 200+ content types. Our Python and Rust libraries support the newer model, as well as our CLI. This website will follow soon, but if you want the latest and greatest, go check those out
Magika leverages the power of cutting-edge deep learning to enhance the world of file type detection. It provides increased accuracy and support for a comprehensive range of content types, outperforming traditional tools with 99%+ average precision and recall.
Designed for efficiency, Magika runs quickly even on a single CPU. A similar model currently scans millions of files per second at Google (see blog post).
Demo
You can drop your files below to test out Magika. The processing happens entirely in your browser - the files won't be uploaded anywhere else.
Initializing Magika...
Get Magika in your command line
You can start using Magika by installing it as a Python package:
pip install magika
Then, you can run it by executing
magika
like so: $ magika examples/* code.asm: Assembly (code) code.py: Python source (code) doc.docx: Microsoft Word 2007+ document (document) doc.ini: INI configuration file (text) elf64.elf: ELF executable (executable) flac.flac: FLAC audio bitstream data (audio) image.bmp: BMP image data (image) java.class: Java compiled bytecode (executable) jpg.jpg: JPEG image data (image) pdf.pdf: PDF document (document) pe32.exe: PE executable (executable) png.png: PNG image data (image) README.md: Markdown document (text) tar.tar: POSIX tar archive (archive) webm.webm: WebM data (video)
Libraries!
You can use Magika in your Python code, or your JavaScript (in Node or client side). In fact, this page is using Magika's JavaScript library!
Paper
You can read our research paper on how the Magika model was trained and its performance on large datasets.
If you use Magika, please cite it like this:
@InProceedings{fratantonio25:magika, author = {Yanick Fratantonio and Luca Invernizzi and Loua Farah and Kurt Thomas and Marina Zhang and Ange Albertini and Francois Galilee and Giancarlo Metitieri and Julien Cretin and Alexandre Petit-Bianco and David Tao and Elie Bursztein}, title = { {Magika: AI-Powered Content-Type Detection} }, booktitle = {Proceedings of the International Conference on Software Engineering (ICSE)}, month = {April}, year = {2025} }
Need more info? See our README on GitHub!
Model card
Model Details
Overview
Magika is a content type detection tool powered by deep learning. It is accurate (99%+ average accuracy on our test dataset across 120+ content types), reasonably fast even on a single CPU (inference time of the underlying model: 5/6ms), and reasonably small in size (the core model is ~1MB). It offers a significant accuracy boost with respect to existing tools.Version
name: v1.0
date: 2024/02/16
Owners
Magika team, magika-dev@google.comLicenses
- Apache-2.0
References
Citations
- https://github.com/google/magika
Considerations
Use Cases
- This model classifies files into a predefined sets of content types.
Limitations
- This model is trained to output a single content type, so polyglot files will not be mapped to two or more categories.
Quantitative Analysis
Name | Value |
---|---|
precision, ai | 100.00% |
precision, apk | 99.24% |
precision, appleplist | 99.94% |
precision, asm | 99.53% |
precision, asp | 99.45% |
precision, batch | 98.52% |
precision, bmp | 99.98% |
precision, bzip | 100.00% |
precision, c | 99.28% |
precision, cab | 99.99% |
precision, cat | 100.00% |
precision, chm | 100.00% |
precision, coff | 99.95% |
precision, cpl | 100.00% |
precision, crx | 99.98% |
precision, cs | 99.69% |
precision, css | 99.61% |
precision, csv | 98.94% |
precision, deb | 99.99% |
precision, dex | 100.00% |
precision, dll | 100.00% |
precision, dmg | 99.98% |
precision, doc | 99.35% |
precision, docx | 99.66% |
precision, dylib | 100.00% |
precision, elf | 99.99% |
precision, emf | 100.00% |
precision, eml | 99.84% |
precision, epub | 99.92% |
precision, exe | 100.00% |
precision, flac | 100.00% |
precision, gif | 100.00% |
precision, go | 99.87% |
precision, gzip | 100.00% |
precision, hlp | 100.00% |
precision, html | 96.66% |
precision, ico | 99.96% |
precision, ini | 98.81% |
precision, internetshortcut | 99.98% |
precision, iso | 99.90% |
precision, jar | 98.89% |
precision, java | 99.30% |
precision, javabytecode | 100.00% |
precision, javascript | 99.03% |
precision, jpeg | 100.00% |
precision, json | 99.44% |
precision, ko | 99.99% |
precision, latex | 99.86% |
precision, lisp | 99.90% |
precision, lnk | 100.00% |
precision, m3u | 100.00% |
precision, macho | 100.00% |
precision, makefile | 99.83% |
precision, markdown | 97.12% |
precision, mht | 99.95% |
precision, mp3 | 99.98% |
precision, mp4 | 100.00% |
precision, mscompress | 100.00% |
precision, msi | 99.88% |
precision, mui | 100.00% |
precision, mum | 99.99% |
precision, ocx | 100.00% |
precision, odex | 99.98% |
precision, odp | 99.94% |
precision, ods | 99.87% |
precision, odt | 99.79% |
precision, ogg | 99.99% |
precision, outlook | 99.42% |
precision, pcap | 99.98% |
precision, pdf | 100.00% |
precision, pem | 99.95% |
precision, perl | 99.45% |
precision, php | 98.87% |
precision, png | 99.99% |
precision, postscript | 99.99% |
precision, powershell | 99.15% |
precision, ppt | 98.67% |
precision, pptx | 99.04% |
precision, python | 99.16% |
precision, pythonbytecode | 100.00% |
precision, randombytes | 99.67% |
precision, rar | 100.00% |
precision, rdf | 99.86% |
precision, rpm | 99.99% |
precision, rst | 96.70% |
precision, rtf | 99.91% |
precision, ruby | 99.59% |
precision, rust | 99.86% |
precision, scala | 99.85% |
precision, scr | 100.00% |
precision, sevenzip | 100.00% |
precision, shell | 97.12% |
precision, smali | 100.00% |
precision, so | 99.99% |
precision, sql | 99.52% |
precision, squashfs | 100.00% |
precision, svg | 99.92% |
precision, swf | 100.00% |
precision, symlinktext | 97.10% |
precision, sys | 100.00% |
precision, tar | 99.97% |
precision, tga | 100.00% |
precision, tiff | 99.95% |
precision, torrent | 100.00% |
precision, ttf | 100.00% |
precision, txt | 94.21% |
precision, vba | 99.69% |
precision, wav | 99.97% |
precision, webm | 100.00% |
precision, webp | 100.00% |
precision, winregistry | 99.97% |
precision, wmf | 100.00% |
precision, xar | 100.00% |
precision, xls | 98.93% |
precision, xlsb | 99.61% |
precision, xlsx | 99.61% |
precision, xml | 98.40% |
precision, xpi | 99.58% |
precision, xz | 100.00% |
precision, yaml | 99.03% |
precision, zip | 97.00% |
precision, zlibstream | 99.99% |
recall, ai | 100.00% |
recall, apk | 98.75% |
recall, appleplist | 99.92% |
recall, asm | 99.42% |
recall, asp | 98.75% |
recall, batch | 96.45% |
recall, bmp | 99.98% |
recall, bzip | 100.00% |
recall, c | 99.29% |
recall, cab | 100.00% |
recall, cat | 100.00% |
recall, chm | 100.00% |
recall, coff | 99.93% |
recall, cpl | 100.00% |
recall, crx | 99.79% |
recall, cs | 99.65% |
recall, css | 98.85% |
recall, csv | 98.50% |
recall, deb | 100.00% |
recall, dex | 99.99% |
recall, dll | 100.00% |
recall, dmg | 100.00% |
recall, doc | 98.46% |
recall, docx | 99.40% |
recall, dylib | 100.00% |
recall, elf | 99.97% |
recall, emf | 99.99% |
recall, eml | 99.90% |
recall, epub | 99.92% |
recall, exe | 99.99% |
recall, flac | 100.00% |
recall, gif | 99.97% |
recall, go | 99.94% |
recall, gzip | 100.00% |
recall, hlp | 100.00% |
recall, html | 97.78% |
recall, ico | 100.00% |
recall, ini | 98.16% |
recall, internetshortcut | 99.93% |
recall, iso | 99.49% |
recall, jar | 97.57% |
recall, java | 99.61% |
recall, javabytecode | 100.00% |
recall, javascript | 99.13% |
recall, jpeg | 100.00% |
recall, json | 99.74% |
recall, ko | 100.00% |
recall, latex | 99.21% |
recall, lisp | 99.79% |
recall, lnk | 100.00% |
recall, m3u | 100.00% |
recall, macho | 100.00% |
recall, makefile | 99.83% |
recall, markdown | 93.86% |
recall, mht | 99.86% |
recall, mp3 | 100.00% |
recall, mp4 | 100.00% |
recall, mscompress | 100.00% |
recall, msi | 99.36% |
recall, mui | 100.00% |
recall, mum | 100.00% |
recall, ocx | 100.00% |
recall, odex | 100.00% |
recall, odp | 99.62% |
recall, ods | 99.68% |
recall, odt | 99.83% |
recall, ogg | 99.99% |
recall, outlook | 99.83% |
recall, pcap | 100.00% |
recall, pdf | 100.00% |
recall, pem | 99.71% |
recall, perl | 99.36% |
recall, php | 98.65% |
recall, png | 100.00% |
recall, postscript | 100.00% |
recall, powershell | 99.40% |
recall, ppt | 99.32% |
recall, pptx | 99.64% |
recall, python | 99.36% |
recall, pythonbytecode | 100.00% |
recall, randombytes | 99.63% |
recall, rar | 100.00% |
recall, rdf | 99.91% |
recall, rpm | 100.00% |
recall, rst | 97.53% |
recall, rtf | 99.97% |
recall, ruby | 99.61% |
recall, rust | 99.68% |
recall, scala | 99.72% |
recall, scr | 99.99% |
recall, sevenzip | 100.00% |
recall, shell | 98.20% |
recall, smali | 99.99% |
recall, so | 99.99% |
recall, sql | 99.45% |
recall, squashfs | 100.00% |
recall, svg | 99.73% |
recall, swf | 100.00% |
recall, symlinktext | 100.00% |
recall, sys | 100.00% |
recall, tar | 100.00% |
recall, tga | 100.00% |
recall, tiff | 100.00% |
recall, torrent | 99.99% |
recall, ttf | 100.00% |
recall, txt | 92.12% |
recall, vba | 98.37% |
recall, wav | 100.00% |
recall, webm | 100.00% |
recall, webp | 100.00% |
recall, winregistry | 99.46% |
recall, wmf | 100.00% |
recall, xar | 100.00% |
recall, xls | 99.30% |
recall, xlsb | 99.33% |
recall, xlsx | 99.58% |
recall, xml | 99.30% |
recall, xpi | 99.32% |
recall, xz | 100.00% |
recall, yaml | 99.24% |
recall, zip | 99.52% |
recall, zlibstream | 100.00% |