Max L

Dangerzone: Open-Source Document Sanitization That Actually Works

Q: Dangerzone vs. The Alternatives

I tested three approaches head-to-head: Dangerzone (free, open source, local): Pros: fully offline, open source, handles 20+ formats, OCR output is searchable Cons: slow on large files, requires Docker/Podman, no batch GUI Qubes OS TrustedPDF (free, requires Qubes): Pros: VM-level isolation (stronger than containers), integrated into the OS Cons: requires running Qubes as your daily OS, PDF-only, no OCR layer Online sanitization services (various, cloud-based): Pros: nothing to install, usually

Written by

Max L

in

Deep Dives

Last month I received a PDF from a vendor that triggered three different AV signatures. The file was “clean” — just a contract — but it had embedded JavaScript, an auto-open action, and metadata pointing to an internal network share. The vendor had no idea. This is the reality of document security in 2026: every PDF, DOCX, and image file is a potential attack vector, and most people have zero tooling to deal with it.

That is when I started using Dangerzone seriously. It is an open-source tool from the Freedom of the Press Foundation that converts potentially dangerous documents into safe PDFs by rendering them inside disposable containers. No network access, no persistent state, no trust required in the source file.

How Dangerzone Actually Works Under the Hood

The architecture is surprisingly elegant. Dangerzone uses a two-container pipeline:

Container 1 (pixels): Takes your input document (PDF, DOCX, XLSX, ODP, images — about 20 formats), converts it to raw RGB pixel data using LibreOffice or Poppler, then outputs flat pixel streams. No parsing of the output format happens here. The container has no network access.
Container 2 (safe PDF): Takes those raw pixels, reassembles them into a clean PDF with OCR text layer (via Tesseract). The output PDF contains only images and an OCR text layer — no JavaScript, no macros, no embedded objects, no metadata from the original.

The key insight: by reducing everything to pixels between containers, you eliminate entire classes of attacks. Malicious macros? Gone after pixel conversion. Embedded executables? Cannot survive rasterization. Tracking URLs in metadata? Stripped completely.

# Install on Ubuntu/Debian
sudo apt install dangerzone

# Or on macOS
brew install --cask dangerzone

# Convert a single file
dangerzone-cli suspicious-contract.pdf

# Batch convert a directory
find ./inbox -name "*.pdf" -exec dangerzone-cli {} \;

Performance: Real Numbers on Real Hardware

I tested Dangerzone 0.8.1 on my homelab (Xeon E-2278G, 64GB RAM, NVMe storage) processing a batch of 50 documents:

Simple 2-page PDF: 8-12 seconds per file
20-page DOCX with images: 25-35 seconds
100-page scanned PDF: 90-120 seconds (OCR is the bottleneck)
Memory usage: peaks at ~800MB per conversion (container overhead)

It is not fast. If you are processing hundreds of files daily, you will want to run it on dedicated hardware. But for the security-conscious workflow of “I just received something from an unknown sender,” 10 seconds is nothing.

Dangerzone vs. The Alternatives

I tested three approaches head-to-head:

Dangerzone (free, open source, local):

Pros: fully offline, open source, handles 20+ formats, OCR output is searchable
Cons: slow on large files, requires Docker/Podman, no batch GUI

Qubes OS TrustedPDF (free, requires Qubes):

Pros: VM-level isolation (stronger than containers), integrated into the OS
Cons: requires running Qubes as your daily OS, PDF-only, no OCR layer

Online sanitization services (various, cloud-based):

Pros: nothing to install, usually faster
Cons: you are uploading potentially sensitive documents to a third party — defeats the purpose

For most people who are not running Qubes, Dangerzone is the best option. It works on Windows, macOS, and Linux, and it never phones home.

My Actual Workflow

I have integrated Dangerzone into my document pipeline with a simple bash script:

#!/bin/bash
# ~/bin/safe-open.sh - sanitize before opening
INPUT="$1"
OUTPUT="/tmp/safe-$(basename "$INPUT")"

echo "Sanitizing: $INPUT"
dangerzone-cli "$INPUT" --output "$OUTPUT" 2>/dev/null

if [ $? -eq 0 ]; then
    xdg-open "$OUTPUT"
    echo "Opened sanitized version"
else
    echo "FAILED: Document could not be sanitized"
    echo "This might indicate something malicious."
fi

I set this as my default PDF handler for files downloaded from email. Every attachment gets sanitized before I see it. The 10-second delay is barely noticeable.

When You Need This (And When You Do Not)

Use Dangerzone when:

Opening documents from unknown or untrusted sources
You are a journalist receiving leaked documents
Processing vendor contracts or RFPs from new companies
You work in finance and receive documents from clients (similar trust-nothing approach as verifying JWTs locally)

Skip it when:

Documents from trusted internal sources you have worked with for years
You need to preserve exact formatting (pixel conversion loses vector quality)
Speed matters more than security for your use case

The Privacy Angle Most People Miss

Beyond malware protection, Dangerzone strips all metadata. When you sanitize a document before sharing it, you remove:

Author names and organization info
Edit history and tracked changes
GPS coordinates in embedded images (same problem I wrote about in my EXIF metadata article)
Internal file paths revealed in error messages
Hidden comments and revision marks

I have seen NDAs with tracked changes showing the entire negotiation history. I have seen “final” PDFs with the original author home directory in the metadata. Dangerzone fixes all of this in one pass.

Setting It Up Right

One gotcha: Dangerzone needs either Docker or Podman. On Linux, I recommend Podman — it runs rootless by default, which means even if someone exploits the container runtime, they do not get root on your host.

# Install Podman (preferred for security)
sudo apt install podman

# Verify Dangerzone sees it
dangerzone-cli --help
# Should show "Using container runtime: podman"

On macOS, you will need Docker Desktop or Podman Desktop installed. The GUI version works fine — just drag and drop files onto it.

If you are already running a homelab with Docker (network segmentation helps here), adding Dangerzone is trivial. I run it on my TrueNAS box and access it over SSH for batch jobs.

What I Would Improve

Dangerzone is not perfect. My complaints after 6 months of daily use:

No watch-folder mode for automated processing
OCR quality degrades on handwritten documents
The container pull on first run is ~2GB — not great for limited storage
No API or daemon mode for integration with other tools

I have been considering wrapping it with inotifywait for a watch-folder setup. If you are security-conscious enough to sanitize documents, you probably want to automate it.

For the paranoid: pair Dangerzone with a solid encrypted storage setup. I keep sanitized documents on an encrypted ZFS dataset. A Samsung T9 portable SSD with hardware encryption works great for this if you need portability — I use one for sensitive client docs that need to travel. Full disclosure: affiliate link.

Dangerzone is at github.com/freedomofpress/dangerzone. It is free, it is open source, and it solves a real problem that most people ignore until they get hit. Install it, set it as your default document handler for untrusted files, and forget about it.

📡 For daily market intelligence and trading signals, join Alpha Signal — free on Telegram.

Dangerzone: Open-Source Document Sanitization That Actually Works

How Dangerzone Actually Works Under the Hood

Performance: Real Numbers on Real Hardware

Dangerzone vs. The Alternatives

My Actual Workflow

When You Need This (And When You Do Not)

The Privacy Angle Most People Miss

Setting It Up Right

What I Would Improve

📚 You Might Also Like

You Might Also Like

More posts

The Frankfurter API: Pull ECB Exchange Rates as JSON (No Key, No Rate Limits)

Verifying Webhook Signatures by Hand: HMAC-SHA256 in the Browser with HashForge

Your Password Generator Is Only as Good as crypto.getRandomValues

The FDIC BankFind API: Pull Any U.S. Bank’s Financials and Failure History as JSON (No Key)