Category: Security

Security is the dedicated cybersecurity category on orthogonal.info, covering everything from application-level secure coding practices to network-layer defenses and zero-trust architecture. In an era where a single misconfigured cloud bucket or unpatched dependency can lead to a headline-making breach, this category provides the practical, hands-on guidance that engineers need to build and maintain secure systems. Each article blends defensive theory with real commands, configurations, and code you can apply immediately.

With 21 posts spanning offensive and defensive security topics, this collection reflects a practitioner’s perspective — not checkbox compliance, but genuine risk reduction.

Key Topics Covered

Application security (AppSec) — Secure coding patterns, input validation, OWASP Top 10 mitigations, and static analysis with tools like Semgrep, Bandit, and CodeQL.
Network security and firewalls — Configuring OPNsense, pfSense, VLANs, WireGuard tunnels, and network segmentation strategies for home and production environments.
CVE analysis and vulnerability management — Dissecting real-world CVEs, understanding CVSS scoring, and building patch management workflows with Trivy, Grype, and OSV-Scanner.
Penetration testing and red teaming — Practical walkthroughs using Nmap, Burp Suite, Nuclei, and Metasploit to identify weaknesses before attackers do.
Zero-trust architecture — Implementing identity-aware proxies, mutual TLS, and least-privilege access using Cloudflare Access, Tailscale, and SPIFFE/SPIRE.
Container and Kubernetes security — Pod security standards, image scanning, runtime protection with Falco, and supply-chain security with Sigstore and cosign.
Secrets management — Storing and rotating secrets with HashiCorp Vault, SOPS, Sealed Secrets, and cloud-native key management services.
Compliance and hardening — CIS Benchmarks, STIGs, and automated compliance scanning for Linux hosts, containers, and cloud accounts.

Who This Content Is For
This category serves security engineers, DevSecOps practitioners, penetration testers, platform engineers, and system administrators who take security seriously without wanting to drown in vendor marketing. Whether you are hardening a homelab, preparing for a SOC 2 audit, or building a secure CI/CD pipeline, the guides here are written by and for people who ship code and defend infrastructure daily.

What You Will Learn
Readers of the Security category will gain the skills to identify and remediate vulnerabilities across the full stack — from source code to running containers to network perimeters. You will learn how to integrate security scanning into CI/CD pipelines, configure firewalls with defense-in-depth principles, analyze CVE disclosures to assess real-world impact, and implement zero-trust networking without crippling developer velocity. Every article prioritizes actionable steps over abstract theory.

Explore the posts below to strengthen your security posture today.

  • Why the Web Crypto API Won’t Compute MD5 (and How HashForge Does It in Your Browser)

    Last week I needed an MD5 checksum to verify a file against a vendor’s published manifest. Old habit kicked in: open devtools, reach for the Web Crypto API, type one line. It failed on the spot:

    await crypto.subtle.digest('MD5', new TextEncoder().encode('abc'))
    // DOMException: Algorithm: Unrecognized name MD5

    No MD5. Not deprecated-with-a-warning — just absent, like it was never on the menu. That single rejection is the whole reason HashForge, the in-browser hash generator I keep bookmarked, ships its own MD5 routine instead of asking the browser. Here’s why the browser says no, and how HashForge works around it without uploading your file anywhere.

    The Web Crypto API blocks MD5 on purpose

    The digest side of the Web Crypto API supports exactly four algorithms: SHA-1, SHA-256, SHA-384, and SHA-512. That list is fixed in the W3C spec. MD5 isn’t missing because nobody filed a ticket — the working group left it out, along with MD4, because shipping a broken hash through an API named “crypto” invites people to misuse it.

    MD5 has had practical collision attacks since 2004, when Wang and Yu produced two different inputs with the same digest by hand-tuning the message. By 2008 researchers used MD5 collisions to forge a rogue CA certificate. The hash is finished for anything where an attacker controls the input.

    Here’s the part I find funny: the browser still lets you compute SHA-1, which Google and CWI fully collided in 2017 with the SHAttered attack. SHA-1 stayed in the spec for backward compatibility with existing protocols. MD5 never made the cut at all. The vendors drew a line, and MD5 landed on the wrong side of it.

    I agree with that call for new code. The catch is that the rest of us still bump into MD5 constantly, and almost never for security:

    • Vendor downloads still publish an MD5 next to the file
    • S3 ETags are the MD5 of the object for single-part uploads
    • Legacy rows store md5(email) for Gravatar-style lookups
    • Plenty of internal tools fingerprint content with MD5 because it’s fast and short

    So you hit a wall. The data is MD5, the browser refuses to compute MD5, and you would rather not paste a confidential file into some random “free MD5 online” site that ships it off to a server you’ve never audited.

    How HashForge fills the gap

    HashForge splits the work in two. For the SHA family it calls the native API — fast, audited, hardware-accelerated on most machines:

    const ALGOS = ['MD5','SHA-1','SHA-256','SHA-384','SHA-512'];
    
    async function hashText(text, algos, enc='hex'){
      const encoded = new TextEncoder().encode(text);
      const out = {};
      for (const algo of algos){
        if (algo === 'MD5'){
          out[algo] = formatHash(md5(encoded.buffer), enc);     // pure JS
        } else {
          const hash = await crypto.subtle.digest(algo, encoded); // native
          out[algo] = formatHash(hash, enc);
        }
      }
      return out;
    }

    For MD5 it falls back to a self-contained JavaScript implementation — the classic safeAdd / bitRotateLeft / md5cmn routine you’ve seen in a dozen libraries, working directly on an ArrayBuffer. No dependency, no network call, a couple hundred lines of code.

    Why MD5 is small enough to ship inline

    MD5 is a Merkle–Damgård construction. It pads the message to a multiple of 512 bits, then chews through it one 512-bit block at a time, updating four 32-bit state words across 64 operations grouped into 4 rounds. The whole thing is integer addition, bit rotation, and a handful of boolean mixing functions. That’s it — no S-boxes, no lookup tables, no big constants beyond a sine-derived table you can generate in one line.

    Because the algorithm is so plain, a correct MD5 fits in a few hundred bytes of minified JavaScript. SHA-512 by hand would be heavier and slower in JS, which is exactly why HashForge doesn’t reimplement the SHA family — the native crypto.subtle path is both faster and already vetted. You only drop to hand-rolled code for the one algorithm the platform won’t give you.

    The privacy detail that actually matters

    Files go through the same split. The page reads the file with file.arrayBuffer() and hands the raw bytes straight to either the native digest or the JS MD5:

    const buf  = await file.arrayBuffer();
    const hash = await crypto.subtle.digest('SHA-256', buf);

    That arrayBuffer() call is the whole privacy story. The bytes are read into memory inside your tab and never touch a network socket. Open the Network panel while you hash a 200 MB ISO and you’ll see zero requests. Pull your wifi and it keeps working, because there was never a server in the loop. Compare that to the typical “online hash calculator,” which POSTs your file to a backend and trusts you to believe their retention policy.

    Verify the output yourself in ten seconds

    Don’t take my word that the MD5 path is correct — a hash tool that quietly mis-pads is worse than no tool. Hash the empty string and abc, then check against the canonical test vectors:

    MD5("")        = d41d8cd98f00b204e9800998ecf8427e
    MD5("abc")     = 900150983cd24fb0d6963f7d28e17f72
    SHA-256("abc") = ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad

    Type abc into HashForge and you’ll get those exact bytes. I cross-checked them against md5sum and sha256sum on a Linux box before trusting the tool with anything real. Two-minute habit, and it catches a surprising number of broken implementations.

    HMAC is native-only, and that’s the right limit

    One place HashForge refuses to fill a gap: HMAC. It offers HMAC-SHA1/256/384/512 and stops there, because Web Crypto’s importKey plus sign('HMAC', ...) only accepts the SHA family. There’s no HMAC-MD5 button.

    That’s correct, not lazy. If you’re computing an HMAC you’re authenticating something, and HMAC-MD5 has no place in new code. The tool steers you to SHA-256 by simply not offering the broken option — the same stance the browser takes on raw MD5, applied one layer up.

    Which hash for which job

    A quick field guide, because this question comes up every week:

    • Matching a published checksum: use whatever the publisher used, MD5 or SHA-256. You’re catching accidental corruption, not an attacker, so a broken hash is fine here.
    • Content fingerprint, cache key, dedup: SHA-256 if you have a free choice; MD5 only to match an existing system.
    • Passwords: none of these. Use Argon2 or bcrypt. A raw SHA-256 of a password is still a leak waiting to happen.
    • Tokens and signatures: HMAC-SHA256 at minimum.

    If you want the actual math behind why MD5 fell and SHA-256 holds, Serious Cryptography by Jean-Philippe Aumasson is the clearest book I’ve found on collision attacks without drowning you in proofs. For the engineering side — where each primitive shows up in TLS, signatures, and storage — Real-World Cryptography by David Wong is the one I lend out most. Full disclosure: both are Amazon affiliate links.

    Why I keep it bookmarked

    The pitch is narrow and that’s the point. I need a hash, I can’t install a CLI on a locked-down work laptop, and I really don’t want to upload a file to a stranger’s server. HashForge does that one job: it computes all five digests at once, outputs hex or Base64, and runs on a text string or a dropped file. It pairs with the other browser-only tools I reach for — Base64Lab when I need to decode a token and PassForge when I need a random key — none of which phone home.

    Try it: HashForge. Hash something, open your Network tab, and watch nothing happen.

    Related reading: How a secure password generator actually works, catching leaked secrets in your git history, and why your online SQL formatter might be logging your data.


    Join https://t.me/alphasignal822 for free market intelligence.

  • How a Secure Password Generator Actually Works (and Why Math.random() Fails)

    Last week I was reviewing a small auth service and found this one-liner generating reset tokens:

    const token = Array.from({length: 16}, () =>
      CHARS[Math.floor(Math.random() * CHARS.length)]
    ).join('');

    It runs. It produces things like xK9$mLp2@nQ7vR4w. It also happens to be a real security bug. That exact pattern is the one I deliberately avoided when I built our free password generator — and the reason is worth 1,200 words, because almost every “roll your own” password snippet on the web gets it wrong in the same way.

    Here’s what’s broken about Math.random() for passwords, the fix, and the two gotchas that bite people who try to fix it themselves.

    Math.random() is predictable by design

    In V8 — the engine behind Chrome and Node — Math.random() has used an algorithm called xorshift128+ since version 4.9.40, shipped in late 2015. (Before that it was MWC1616, which was worse: only about 232 possible outputs.) xorshift128+ has 128 bits of internal state, a period of 2128 − 1, and it passes the TestU01 statistical suite. Statistically, the numbers look random.

    But “looks random” and “unpredictable” are different properties. xorshift128+ is a pseudo-random generator: every output is a deterministic function of that 128-bit state. And the state is recoverable. Feed enough consecutive outputs into a system of linear equations and you can solve for the internal state — there are public tools on GitHub that recover it from as few as 64 to 128 consecutive Math.random() calls. Once an attacker has the state, every future output is known. Every “random” password you generate after that point is predictable.

    For a UI animation or a Monte Carlo sim, who cares. For a password, an API key, or a session token, that’s the whole ballgame.

    crypto.getRandomValues() is the actual fix

    Browsers ship a cryptographically secure RNG (CSPRNG) through the Web Crypto API: crypto.getRandomValues(). It pulls from the operating system’s entropy pool (/dev/urandom on Linux, BCryptGenRandom on Windows) and is built so that observing past output tells you nothing about future output. There’s no recoverable 128-bit state to solve for.

    The function our generator uses is four lines:

    function secureRandom(max) {
      const arr = new Uint32Array(1);
      crypto.getRandomValues(arr);
      return arr[0] % max;
    }

    Read a fresh 32-bit unsigned integer from the CSPRNG, reduce it into the range you need, done. Swap Math.random() for this and the prediction attack above is gone. But notice that % max — that’s gotcha number one.

    Gotcha 1: modulo bias is real (but size matters)

    When you take a random integer modulo your alphabet size, the ranges usually don’t divide evenly, so some characters come up more often than others. I wanted to see how bad it actually is, so I generated 6.2 million random bytes and bucketed byte % 62 (a typical alphanumeric set):

    expected per character:  100,000
    lowest-frequency char:   ~96,900 hits
    highest-frequency char: ~121,400 hits
    ratio: 1.25

    That’s a 25% skew. It happens because 256 % 62 = 8, so byte values 0–7 each give one extra shot to the first eight characters. With a single byte feeding a 62- or 94-character set, the bias is large and easy to measure.

    The textbook fix is rejection sampling: throw away any byte in the biased tail and draw again. Rejecting values ≥ 248 dropped the skew to a 1.02 ratio in my test, at the cost of discarding about 3.1% of draws.

    But here’s the part the “always use rejection sampling” advice skips: the bias depends entirely on how big your random integer is relative to the alphabet. Our generator doesn’t read a single byte — it reads a full Uint32 (range 0 to about 4.29 billion). For a 94-character symbol set, Uint32 % 94 makes the favored characters more likely by roughly 1 part in 45 million — a bias of 0.0000022%. For a password, that’s noise far below anything that matters. So I skipped rejection sampling on purpose and kept the code simple, because a 32-bit draw already makes the bias irrelevant. If I were minting cryptographic keys I’d add the rejection step; for human passwords, a wide draw is enough.

    Gotcha 2: the 64KB quota wall

    The second surprise showed up while I was running that bias test. My first attempt asked getRandomValues() to fill one big buffer:

    crypto.getRandomValues(new Uint8Array(620000));
    // QuotaExceededError: The requested length exceeds 65,536 bytes

    getRandomValues() refuses any request over 65,536 bytes (64 KB) in a single call. It’s in the spec and every browser enforces it. If you’re generating one 16-character password you’ll never hit it, but the moment you batch-generate or fill a large buffer, you have to chunk:

    function fillSecure(buf) {
      for (let i = 0; i < buf.length; i += 65536) {
        crypto.getRandomValues(buf.subarray(i, i + 65536));
      }
    }

    Undocumented in most tutorials, and a hard failure rather than a silent one — which is at least honest of it.

    Why browser-only matters here

    Our generator runs entirely in your browser. The password is built on your machine from your OS entropy and never touches a network. That’s not a tagline — it’s the only design that makes sense for a secret. A “password generator” that does the work server-side is a service that has seen your password in plaintext, which is the same trust problem I wrote about with online SQL formatters quietly logging queries. Open the dev tools, watch the Network tab while you click generate, and you’ll see exactly zero requests.

    You can try it here: the orthogonal.info password generator. Slide to 16+ characters, toggle the symbol set, copy, done.

    One layer is never enough

    A strong, truly-random password fixes the “guessable” problem. It does nothing about phishing, reused credentials, or a leaked database. After the LastPass mess I moved my own vault into KeePassXC and put a hardware key on every account that supports one. A YubiKey 5 NFC turns a stolen password into a useless string, because login also needs the physical key in my pocket. Full disclosure: that’s an affiliate link — but it’s also literally what’s on my keyring. Generate unique passwords, store them in a real manager, and gate the important accounts with hardware 2FA. Three cheap layers beat one strong one.

    The lesson I keep relearning: in security, the code that “works” and the code that’s correct are often the same length and completely different. Math.random() works. crypto.getRandomValues() is correct.


    Want signal instead of noise on markets and tech? Join https://t.me/alphasignal822 for free market intelligence.

  • Your Online SQL Formatter Might Be Logging Your Database Password

    Last month I watched a contractor paste a full Kubernetes secret manifest — base64 blobs and all — into the first “free YAML validator” that came up on Google. He just wanted to check indentation. What he actually did was POST a production database password to a server he’d never heard of, run by people he’ll never meet, with a privacy policy he didn’t read.

    That’s the part of online dev tools nobody talks about. A SQL formatter, a YAML validator, a JSON beautifier — they feel disposable, like a calculator. But a huge number of them send whatever you paste to a backend for processing. If that paste contains a connection string, an API key, or a customer record, you just leaked it. No breach required. You handed it over.

    Why “format my SQL” is a data exfiltration path

    Here’s the mechanic. Server-side tools work like this: your text goes into a textarea, JavaScript fires an HTTP request to /api/format, the server runs the actual formatting, and the result comes back. Simple to build, which is exactly why so many sites do it that way.

    The problem is what travels in that request body. I tested a handful of popular online formatters with my browser’s Network tab open. Several of them sent the entire input payload to their own domain. One sent it to a third-party API. The query I pasted was harmless test data, but the request was real — my text left my machine.

    Now picture the realistic version. You’re debugging a failing migration at 11pm. You copy the offending query straight out of your ORM logs to “just clean it up.” That query has a hardcoded credential a teammate left in six months ago. You paste, you format, you move on. The credential is now in someone’s request logs, maybe their analytics, maybe an LLM training pipeline if the tool resells data. You will never know.

    This isn’t paranoia. It’s the same threat model that makes pasting code into random pastebins a fireable offense at most security-conscious shops. We just don’t apply it to “format” tools because they feel too small to matter.

    The browser-only alternative

    The fix is structural, not procedural. Don’t rely on remembering to scrub secrets first — use tools that physically can’t send your data anywhere, because all the work happens in your tab.

    That’s the whole reason I built our formatters as single-file, client-side apps. When you use the SQL Formatter, the YAML Validator, or the Diff Checker, the parsing and formatting runs in JavaScript on your device. There is no /api/format endpoint. There’s no backend at all. The text in your textarea never crosses the network, because there’s nowhere for it to go.

    For a diff tool this matters even more. People routinely paste two versions of a config file — say, a working .env and a broken one — to spot what changed. Those files are nothing but secrets. A browser-only diff means you can compare two API keys character by character without either one leaving your laptop.

    How to actually verify a tool is client-side

    Don’t take any tool’s word for it, including mine. Verifying is a two-minute job and every developer should know how.

    1. Watch the Network tab. Open DevTools (F12), go to the Network panel, clear it, then paste your text and hit format. If you see a new XHR or fetch request fire with your input in the payload, the tool is server-side. If nothing happens on the network, the work is local.

    // What a server-side formatter looks like in Network tab:
    POST /api/format-sql
    Request Payload: { "query": "SELECT * FROM users WHERE token='sk_live_...'" }
    
    // What a client-side tool looks like:
    // (nothing — no request fires when you click format)

    2. Kill your connection. The bluntest test there is. Load the page, then turn off Wi-Fi or drop into airplane mode. If the tool still formats your text, it’s running entirely in the browser. If it spins or errors, it needed a server. I do this with any tool before I trust it with anything sensitive.

    3. Check for a service worker. Truly offline-capable tools register a service worker so they work with no connection at all. In DevTools, look under Application → Service Workers. Its presence is a strong signal the developer designed for offline-first, which usually means client-side processing too.

    Where this fits in a real workflow

    A few concrete cases where I reach for browser-only tools specifically because of the data:

    • Reviewing a teammate’s config PR. Diffing two Helm values files that contain registry credentials — done locally, nothing logged anywhere.
    • Cleaning up a query from prod logs. Format it to read it, without shipping whatever sensitive WHERE clause it carries to a stranger’s server.
    • Validating a CI secrets file. Checking that a GitHub Actions YAML parses before you commit, without exposing the encrypted values to a validation API.
    • On a locked-down network. Some client environments block external dev-tool domains entirely. Offline-capable tools just keep working.

    The broader point: treat every “paste your text here” box as a potential outbound network call until you’ve proven otherwise. Most of the time it’s fine. The one time it isn’t, it’s a leaked credential you can’t un-leak.

    Defense in depth still applies

    Browser-only tools remove one exfiltration path, but they don’t make you immune to the dumber failure modes — like a secret sitting in your shell history or git log in the first place. If you handle credentials daily, a hardware key cuts a whole class of phishing and credential-theft risk off at the knees. I use a YubiKey 5 Series for exactly this (full disclosure: affiliate link, but it’s the same key I carry on my own keyring). Pair that with the pre-commit secret scanning setup I wrote about earlier, and you’ve closed the two most common ways credentials walk out the door.

    Start with the small habit, though. Next time you reach for an online formatter or diff tool, open the Network tab first. If your text leaves the browser, find one that keeps it home.


    Join https://t.me/alphasignal822 for free market intelligence.

  • I Switched to KeePassXC After LastPass Got Breached — Here’s My Setup

    Last December I got the email every LastPass user dreaded: my vault backup was part of the breach. The master password was strong, but knowing encrypted blobs of my entire digital life were sitting on some attacker’s disk made me physically uncomfortable. I spent a weekend migrating everything to KeePassXC, and six months later I’m not going back.

    Why Local-First Matters for Passwords

    The LastPass breach exposed a fundamental problem with cloud password managers: your encrypted vault is only as safe as the infrastructure storing it. LastPass used 100,100 PBKDF2 iterations for newer accounts — older accounts had as few as 5,000. That’s crackable with a decent GPU rig.

    KeePassXC stores everything in a single .kdbx file on your machine. No servers, no breach notifications, no third-party trust. The file uses AES-256 or ChaCha20 encryption with Argon2d key derivation — you control the iteration count, memory usage, and parallelism. I run mine at 64MB memory / 10 iterations / 4 threads, which takes about 1 second to unlock on my laptop but would cost serious money to brute-force.

    The Setup That Actually Works Day-to-Day

    The knock against local password managers has always been “but what about sync?” Fair point. Here’s how I solved it without trusting anyone else with my vault:

    # My .kdbx lives in a Syncthing folder shared between:
    # - Work laptop (Linux)
    # - Personal desktop (Windows)
    # - Phone (via Syncthing + KeePassDX on Android)
    
    ~/.local/share/syncthing/vault/
    ├── passwords.kdbx
    └── passwords.kdbx.key   # key file (separate from master password)

    Syncthing handles peer-to-peer sync over my local network and WireGuard tunnel when I’m away. The vault never touches anyone else’s servers. Conflict resolution? KeePassXC handles .kdbx merge conflicts natively since version 2.7 — it’ll prompt you to merge changes if two devices edited simultaneously.

    Hardware Key as Second Factor

    This is where it gets good. KeePassXC supports YubiKey challenge-response as an additional key factor. My unlock requires:

    1. Master password (memorized, 6 random words)
    2. Key file (stored only on my devices, never synced to cloud)
    3. YubiKey HMAC-SHA1 challenge-response (slot 2)

    Setting this up:

    # Program YubiKey slot 2 for HMAC-SHA1 challenge-response
    ykman otp chalresp --generate 2
    
    # In KeePassXC: Database → Database Security → Add Additional Protection
    # Select "Challenge-Response" → pick your YubiKey

    An attacker who steals my .kdbx file needs all three factors. Even if they get my laptop with the key file, they still need the physical YubiKey and the password. I keep a backup YubiKey 5 NFC in my safe — $50 for peace of mind that I won’t lock myself out.

    Browser Integration Without the Extension Tax

    KeePassXC’s browser integration works through a native messaging host — no network calls, no cloud sync of browser state. I tested fill speed across three setups:

    Setup Fill latency Memory overhead
    1Password (extension) 180-400ms ~85MB
    Bitwarden (extension) 120-300ms ~60MB
    KeePassXC (native messaging) 30-80ms ~12MB

    KeePassXC fills faster because it communicates through a Unix socket to the running desktop app — no HTTP round-trips, no extension JavaScript parsing the DOM. The browser add-on is just a thin UI layer.

    # Enable browser integration (Linux)
    # KeePassXC → Tools → Settings → Browser Integration
    # Check "Enable browser integration"
    # Check "Firefox" and/or "Chromium"
    # It writes the native messaging manifest automatically to:
    # ~/.mozilla/native-messaging-hosts/org.keepassxc.keepassxc_browser.json

    Honest Comparison: KeePassXC vs The Cloud Options

    vs Bitwarden — Bitwarden is the closest competitor and genuinely good. It’s open source, self-hostable (Vaultwarden), and the free tier is generous. I’d recommend it to anyone who doesn’t want to manage sync themselves. The tradeoff: you’re trusting their server-side encryption implementation, or running your own server (which means patching, backups, certificates). KeePassXC has no server component to maintain or secure.

    vs 1Password — Polished UI, great team features, expensive ($36/year individual, $60/year family). The “Secret Key” system is clever — it means 1Password can’t decrypt your vault even with a breach. But it’s closed source. You’re trusting their claims. For a solo developer who reads source code, that’s a non-starter for me.

    vs LastPass — Just don’t. After the 2022 breach, the 2023 follow-up showing employee vaults were compromised, and the consistently slow response times… there’s no reason to trust them with anything sensitive.

    The One Thing That Annoys Me

    Mobile is worse than cloud managers. Full stop. KeePassDX on Android works, but auto-fill is flaky on some apps, and you need to manually trigger sync if you added a password on desktop 30 seconds ago. I’ve accepted this tradeoff — I add most passwords on desktop anyway, and the security model is worth the occasional inconvenience on mobile.

    Migration Script

    If you’re coming from LastPass, Bitwarden, or 1Password, KeePassXC imports CSV exports directly. Here’s my cleanup script that runs after import to organize entries:

    #!/usr/bin/env python3
    """Post-import cleanup for KeePassXC CSV import.
    Removes duplicate entries and normalizes URLs."""
    import csv, sys
    from urllib.parse import urlparse
    
    def normalize_url(url):
        parsed = urlparse(url)
        return f"{parsed.scheme}://{parsed.netloc}".lower()
    
    seen = {}
    with open(sys.argv[1]) as f:
        reader = csv.DictReader(f)
        for row in reader:
            key = (row['Username'], normalize_url(row.get('URL','')))
            if key not in seen or len(row.get('Password','')) > len(seen[key].get('Password','')):
                seen[key] = row
    
    print(f"Deduplicated: {len(seen)} unique entries")

    My Recommendation

    If you’re a developer comfortable with file management and want zero cloud trust for your passwords: KeePassXC + Syncthing + YubiKey is the strongest setup I’ve found. Total cost: $50 for the YubiKey (plus a backup), everything else is free and open source.

    If you want something that “just works” across devices without any setup: Bitwarden free tier. No shame in that — it’s genuinely good software.

    For more tools and privacy-focused workflows, check out our security guides and tools section.

    Related reading: how a secure password generator actually works and the pre-commit setup that stopped 14 leaked secrets in my git history.


    Full disclosure: Amazon links above are affiliate links (tag=orthogonalinf-20). I bought my YubiKeys at full price before writing this.

    📡 Join https://t.me/alphasignal822 for free market intelligence — we cover fintech security and trading tools daily.

  • I Caught 14 Leaked Secrets in My Git History — Here’s the Pre-Commit Setup That Stops It

    Last month I ran trufflehog against one of my private repos — a homelab automation project I’d never planned to open-source. It found 14 live secrets. AWS keys, a Telegram bot token, two database passwords, and a Stripe test key that still had access to customer data. All committed between 2022 and 2024, scattered across dozens of commits.

    The fix took me about 20 minutes. I now run two tools as pre-commit hooks that catch secrets before they ever reach .git/objects. Here’s exactly how I set it up, what each tool catches that the other misses, and the one configuration mistake that will give you false confidence.

    Why Two Tools: git-secrets vs trufflehog

    I use both git-secrets and trufflehog because they work differently and catch different things.

    git-secrets is pattern-based. It ships with AWS-specific patterns out of the box (matches AKIA[0-9A-Z]{16} and similar) and lets you add custom regexes. It’s fast — sub-100ms on most commits — and runs as a native git hook. The downside: it only knows what you tell it to look for.

    trufflehog uses entropy detection and pattern matching. It calculates Shannon entropy on strings and flags anything that looks random enough to be a key. Version 3 also verifies secrets against live APIs — it’ll actually try your AWS key against STS to confirm it’s active. This is slower (2-5 seconds per commit) but catches novel secret formats that pattern matching misses.

    In my 14-secret audit, git-secrets would have caught 9 of them. trufflehog caught all 14. But git-secrets has zero false positives in my workflow, while trufflehog flags about 1 false positive per week on base64-encoded config blobs.

    Setting Up git-secrets as a Pre-Commit Hook

    Install it:

    brew install git-secrets   # macOS
    # or
    git clone https://github.com/awslabs/git-secrets.git
    cd git-secrets && make install

    Register it in your repo:

    cd your-repo
    git secrets --install
    git secrets --register-aws

    That --register-aws flag adds patterns for AWS access keys, secret keys, and account IDs. Now add your own patterns for whatever services you use:

    # Telegram bot tokens (numeric:alphanumeric format)
    git secrets --add '[0-9]{8,10}:[A-Za-z0-9_-]{35}'
    
    # Stripe keys
    git secrets --add 'sk_(live|test)_[A-Za-z0-9]{24,}'
    
    # Generic high-entropy passwords in connection strings
    git secrets --add 'password\s*=\s*[^\s]{12,}'

    Test it works:

    echo "AKIAIOSFODNN7EXAMPLE" > test.txt
    git add test.txt
    git commit -m "test"
    # Output: [ERROR] Matched one or more prohibited patterns

    One gotcha: git secrets --install only sets up hooks in that repo. For global coverage across all repos:

    git secrets --install ~/.git-templates/git-secrets
    git config --global init.templateDir ~/.git-templates/git-secrets

    Adding trufflehog as a Pre-Commit Hook

    I use the pre-commit framework for trufflehog since it handles updates and version pinning:

    # .pre-commit-config.yaml
    repos:
      - repo: https://github.com/trufflesecurity/trufflehog
        rev: v3.78.1
        hooks:
          - id: trufflehog
            entry: trufflehog git file://. --since-commit HEAD --only-verified --fail
            stages: [commit, push]

    The --only-verified flag is important. Without it, trufflehog reports every high-entropy string — UUIDs, hashes, random test data. With it, you only get alerts for secrets that are confirmed active against their respective APIs. This drops false positives from ~30/week to about 1.

    Install and activate:

    pip install pre-commit
    pre-commit install
    pre-commit install --hook-type pre-push

    The Configuration Mistake That Gives False Confidence

    Here’s what tripped me up for months: git-secrets only scans staged changes by default, not the full file. If you have a secret on line 5 and you modify line 50, git-secrets won’t flag it because line 5 isn’t in the diff.

    This matters because secrets often enter a file in one commit and stay there forever. The pre-commit hook only fires on new changes, so existing secrets remain invisible.

    Fix: run a full-repo scan on a schedule. I have this in a weekly cron:

    # Scan entire repo history
    trufflehog git file:///path/to/repo --only-verified --json > /tmp/secrets-audit.json
    
    # Scan all current files (not just diffs)
    git secrets --scan

    I pipe the output to ntfy for notifications. If something shows up, I rotate the credential immediately and use git filter-repo to purge it from history:

    git filter-repo --invert-paths --path secrets.env
    # Then force-push and tell collaborators to re-clone

    What About GitHub’s Built-in Secret Scanning?

    GitHub’s secret scanning (free for public repos, paid for private) is solid but it’s a safety net, not prevention. By the time GitHub alerts you, the secret has already been pushed to a remote. If your repo was public for even 5 seconds, bots have already scraped it — I’ve seen AWS keys exploited within 4 minutes of being pushed.

    Pre-commit hooks stop the secret locally. That’s the difference between “we caught it early” and “we need to rotate everything and audit CloudTrail logs.”

    My Full .pre-commit-config.yaml

    Here’s what I run on every project now:

    repos:
      - repo: https://github.com/trufflesecurity/trufflehog
        rev: v3.78.1
        hooks:
          - id: trufflehog
            entry: trufflehog git file://. --since-commit HEAD --only-verified --fail
            stages: [commit, push]
    
      - repo: https://github.com/gitleaks/gitleaks
        rev: v8.18.4
        hooks:
          - id: gitleaks
            stages: [commit]

    I actually dropped git-secrets from the pre-commit config because gitleaks covers similar patterns with better regex coverage and active maintenance. I still keep git-secrets installed globally as a backup layer — defense in depth.

    Total overhead per commit: about 3 seconds. That’s a tiny price for never accidentally leaking credentials again.

    Hardware Keys Add Another Layer

    If you’re serious about credential security, pairing this with a hardware security key like the YubiKey 5 NFC means even if a secret leaks, an attacker can’t use it without physical access to your key. I wrote about my YubiKey migration previously — the short version is it took a weekend and now my GitHub, AWS, and Stripe accounts all require physical touch to authenticate.

    For teams, the YubiKey 5C NFC (USB-C) is the better pick since most developer laptops have dropped USB-A at this point.

    Practical Next Steps

    If you do nothing else today: run trufflehog git file://. in your most-used repo. You might be surprised. I was.

    Then set up the pre-commit hooks. It takes 5 minutes and the muscle-memory of “commit blocked — fix it — re-commit” builds fast. After a month you’ll instinctively reach for environment variables instead of hardcoding strings.

    Related: I previously ran Trivy against my homelab containers and found similar hygiene issues. Security scanning is one of those things where the first run is always humbling.


    Full disclosure: links to YubiKey products above are affiliate links.

    📡 Get free daily market intelligence and trading signals: Join Alpha Signal on Telegram — AI-driven analysis delivered before market open.

  • I Ran Trivy on Every Container in My Homelab — The Results Were Embarrassing

    Last weekend I had a quiet Saturday morning and made the mistake of running trivy image against every container in my homelab. I have 47 containers running on TrueNAS. I expected maybe a handful of medium-severity CVEs. What I got was 312 critical vulnerabilities across 23 containers.

    Here’s what I found, what I fixed, and the automated scanning pipeline I built so this never sneaks up on me again.

    The Initial Scan: A Reality Check

    Trivy is a free, open-source vulnerability scanner from Aqua Security. It scans container images, filesystems, and git repos. Installation is one line:

    curl -sfL https://raw.githubusercontent.com/aquasecurity/trivy/main/contrib/install.sh | sh -s -- -b /usr/local/bin

    I wrote a quick bash loop to scan every running image:

    docker ps --format "{{.Image}}" | sort -u | while read img; do
      echo "=== $img ==="
      trivy image --severity CRITICAL,HIGH --quiet "$img"
    done

    The output was grim. My worst offenders:

    • node:16-alpine (used by 4 containers) — 43 critical CVEs. Node 16 went EOL in September 2023. I was running a 3-year-dead runtime.
    • python:3.8-slim — 28 critical CVEs including a libexpat remote code execution (CVE-2024-45491)
    • nginx:1.21 — HTTP/2 rapid reset vulnerability (CVE-2023-44487) still unpatched
    • postgres:13 — multiple privilege escalation paths

    The common thread: I’d deployed these containers months (or years) ago and never updated them. They worked, so I forgot about them. Classic homelab syndrome.

    Trivy vs Grype vs Docker Scout: Which Scanner Actually Works?

    Before automating anything, I tested three scanners against the same image (node:18-alpine) to compare results:

    Scanner CVEs Found Scan Time DB Size False Positives
    Trivy 0.52 47 8.2s ~40MB 2
    Grype 0.79 44 6.1s ~130MB 1
    Docker Scout 51 12.4s Cloud 5

    All three found the same critical issues. Trivy found slightly more because it also scans language-specific packages (npm, pip) inside the image, not just OS packages. Grype was faster but missed some application-level dependencies. Docker Scout flagged the most but had more noise — it flagged a few CVEs in packages that weren’t actually reachable in my configuration.

    I went with Trivy because it’s the most complete out of the box and the JSON output is clean for automation.

    Building an Automated Scan Pipeline

    Running manual scans is useless if you forget to do it. Here’s the cron-based setup I built:

    #!/bin/bash
    # /opt/scripts/scan-containers.sh
    REPORT_DIR="/opt/reports/trivy"
    mkdir -p "$REPORT_DIR"
    DATE=$(date +%Y-%m-%d)
    
    docker ps --format "{{.Image}}" | sort -u | while read img; do
      SAFE_NAME=$(echo "$img" | tr '/:' '__')
      trivy image --format json --severity CRITICAL,HIGH     --quiet "$img" > "$REPORT_DIR/${SAFE_NAME}_${DATE}.json"
    done
    
    # Count criticals
    CRITS=$(cat $REPORT_DIR/*_${DATE}.json |   python3 -c "import json,sys;   total=sum(len(r.get('Vulnerabilities',[]))   for f in sys.stdin for r in json.loads(f.read()).get('Results',[]));   print(total)")
    
    if [ "$CRITS" -gt 0 ]; then
      curl -s -d "Found $CRITS critical/high CVEs across containers"     ntfy.sh/your-alerts-topic
    fi

    This runs daily at 6 AM via cron. If any critical or high CVEs appear, I get a push notification. The JSON reports accumulate so I can track trends — am I getting better or worse over time?

    The Fix Strategy: Prioritize by Exposure

    312 CVEs sounds terrifying, but not all vulnerabilities are equal. I prioritized based on three factors:

    1. Network exposure — Is this container reachable from the internet? My reverse proxy (nginx) and Gitea instance were top priority.
    2. Data sensitivity — Containers touching personal data or credentials got fixed next.
    3. Exploit availability — Trivy flags whether a public exploit exists. CVEs with known exploits jump the queue.

    The actual fixes were boring but effective:

    • Updated 12 base images to current versions (node:22-alpine, python:3.12-slim, nginx:1.27)
    • Pinned image digests instead of tags in my docker-compose files — nginx:1.27@sha256:abc123... prevents silent tag mutations
    • Deleted 8 containers I wasn’t even using anymore. If it’s not running, it can’t be exploited.
    • Added --read-only filesystem flags to 15 containers that had no business writing to disk

    Total time: about 4 hours spread across two evenings. My critical CVE count dropped from 312 to 7 — and those 7 are in packages awaiting upstream patches with no public exploits.

    What I’d Do Differently

    If I were setting up a homelab today, I’d do three things from day one:

    Pin everything. Never use :latest. Never use bare version tags. Use full digests. Yes, it’s more work when updating. That’s the point — updates become intentional, not accidental.

    Scan on pull. Add a pre-deploy hook that runs Trivy before any new image goes live. Block deployment if critical CVEs exist. This takes 10 seconds per image and prevents the backlog from growing.

    Use distroless or Alpine. My python:3.8-slim had 28 CVEs. A distroless Python image for the same app? 3 CVEs, all low severity. Smaller attack surface means fewer things to patch.

    My Current Setup: Hardware That Makes This Painless

    Running 47 containers plus daily vulnerability scans needs decent hardware. I’m using a TrueNAS box with 64GB ECC RAM — scanning all images in parallel takes about 2 minutes. If you’re building or upgrading a homelab server, ECC RAM matters when you’re running this many services. I’ve had good results with the Kingston Server Premier 32GB DDR4 ECC (affiliate link) — two sticks give you 64GB with room for ZFS caching.

    For storage, container images eat disk fast. A decent NVMe for your Docker storage pool makes both pulls and scans noticeably faster. The Samsung 980 Pro 2TB (affiliate link) has been solid in my setup for two years with heavy container churn.

    The Bottom Line

    If you haven’t scanned your containers recently, do it today. It takes 5 minutes and the results will probably surprise you. Trivy is free, fast, and the output is actionable.

    The real lesson: security debt compounds silently. A container that was fine when you deployed it 18 months ago might have 40+ critical CVEs today. Automated scanning turns an invisible problem into a visible one, and visible problems get fixed.

    For more tools and security workflows, check out our DevSecOps guide and the homelab security guide.


    Want daily market intelligence with the same no-fluff approach? Join Alpha Signal for free — actionable signals, no hype.

  • I Replaced All My Passwords with a YubiKey — Here’s What Actually Happened

    Last month I locked myself out of my GitHub account. Again. My TOTP app had synced to a new phone but silently dropped three seeds during the transfer. That was the third time in two years I’d lost access to something important because of software-based 2FA. I ordered a YubiKey 5 NFC that afternoon.

    Six weeks later, every account I care about uses FIDO2/WebAuthn hardware authentication. No more six-digit codes. No more seed backups. No more “did my authenticator app actually sync?” anxiety. Here’s what the transition actually looks like — the good parts and the frustrating ones.

    Why Software 2FA Keeps Failing

    TOTP (those six-digit rotating codes) has a fundamental problem: the secret is just a string that lives on your phone. Phone dies? Secret’s gone. Switch phones? Hope your backup worked. Get phished? An attacker with your password and your current TOTP code has everything they need — and phishing proxies like Evilginx2 automate this in real time.

    FIDO2 hardware keys solve this differently. The private key never leaves the physical device. Authentication uses a challenge-response protocol tied to the specific domain — so even if you click a perfect phishing link to g00gle.com, the key won’t respond because the domain doesn’t match. It’s not just a second factor; it’s phishing-proof by design.

    I tested this myself. I set up a fake login page on my local network and tried to authenticate with my YubiKey. Nothing happened. The browser prompted me, I tapped the key, and it simply refused. With TOTP, I would have typed the code without thinking.

    The Hardware: YubiKey 5 NFC vs. the Alternatives

    I went with the YubiKey 5 NFC (USB-A) as my primary and a YubiKey 5C NFC (USB-C) as backup. You always want two keys — if you lose one, the backup gets you back in. Full disclosure: affiliate links.

    Here’s how the main options compare:

    • YubiKey 5 NFC (~$50) — supports FIDO2, U2F, smart card (PIV), OpenPGP, OTP. Works with USB-A and NFC on phones. The Swiss Army knife option. I’ve been using mine daily for six weeks with zero issues.
    • Google Titan Security Key (~$30) — FIDO2 and U2F only. No smart card, no OpenPGP. Cheaper, but if you want to sign Git commits or use SSH keys on the hardware, you’re stuck.
    • SoloKeys Solo 2 (~$30) — open-source firmware, FIDO2 only. Great if you want to audit the code yourself. Limited protocol support compared to YubiKey.
    • Nitrokey 3 (~$50) — open-source, supports FIDO2, OpenPGP, PIV. Solid open-source alternative to YubiKey, though firmware updates have historically been slower.

    I picked YubiKey because of the protocol breadth. I use FIDO2 for web logins, PIV for SSH, and OpenPGP for Git commit signing — all on one device. If you only need web authentication, the Titan or Solo 2 will save you $20.

    Setting Up FIDO2 on Everything That Matters

    The registration process is the same everywhere: go to security settings, choose “Security Key,” tap your YubiKey when prompted, done. But the details vary enough to be annoying.

    GitHub — smooth. Settings → Password and authentication → Security keys. Register both keys (primary + backup). Took 2 minutes. GitHub also supports using the key for git push verification via SSH resident keys:

    ssh-keygen -t ed25519-sk -O resident -O application=ssh:github
    # Tap YubiKey when it blinks
    # Upload the .pub to GitHub SSH keys

    Now every git push requires a physical tap. No one’s pushing to my repos from a compromised machine.

    Google — also smooth, but with a catch. You need to enroll in Google’s Advanced Protection Program to get the full benefit. Without it, Google still allows fallback to SMS or TOTP, which defeats the purpose. With Advanced Protection, only hardware keys work. Period.

    AWS — this one frustrated me. AWS IAM supports FIDO2 for root accounts and IAM users, but the console registration flow is finicky. I had to use Chrome (Firefox didn’t trigger the WebAuthn prompt correctly in May 2026). Once registered, it works reliably.

    Cloudflare — perfect support. They use hardware keys internally and it shows. Registration took 30 seconds.

    SSH Authentication Without Software Keys

    This is where things get interesting for developers. Instead of keeping an ed25519 private key in ~/.ssh/, you can generate a resident key that lives on the YubiKey itself:

    # Generate a resident SSH key on the YubiKey
    ssh-keygen -t ed25519-sk -O resident -O verify-required
    
    # Load it from the key (works on any machine with the YubiKey plugged in)
    ssh-add -K
    
    # Check it's loaded
    ssh-add -L

    The -O verify-required flag means you need to enter the YubiKey’s PIN and tap it for each SSH connection. Paranoid? Yes. But it means even if someone steals your unlocked laptop, they can’t SSH anywhere without the physical key and the PIN.

    I use this for all my homelab connections. My TrueNAS server, my development VMs, my remote build machines — all require the YubiKey tap. The ~/.ssh/ directory on my laptop has exactly zero private key files in it now.

    The Annoying Parts (Because Nothing Is Perfect)

    I won’t pretend this is all smooth sailing. Some real friction points:

    • Mobile is awkward. NFC works on Android and iOS, but you have to hold the key against the right spot on your phone. On my Pixel 8, the NFC reader is in the center-back. On iPhones, it’s at the top. Every login on mobile involves an awkward fumble.
    • Not everything supports FIDO2. My bank doesn’t. My health insurance portal doesn’t. Some services technically support it but bury the option so deep you’d never find it without documentation.
    • Two keys minimum is expensive. At $50 each, you’re spending $100+ before you’ve protected a single account. Compared to free authenticator apps, that’s a tough sell for people who haven’t been burned yet.
    • Recovery codes are still important. If you lose both keys (fire, theft), you need recovery codes. I print mine and keep them in a fireproof safe. It’s not elegant but it works.

    What Changed After Six Weeks

    The biggest surprise wasn’t security — it was speed. Tapping a key takes about 0.5 seconds. Pulling up an authenticator app, finding the right account, and typing six digits takes 10-15 seconds. Over dozens of logins per week, that adds up.

    I also stopped worrying about phone transfers. My YubiKey doesn’t care what phone I’m using. It doesn’t sync anywhere. It doesn’t need a backup. It’s just a piece of hardware on my keyring.

    For developers specifically: the SSH resident key feature alone is worth the price. Not having private keys on disk removes an entire attack surface. Combined with a good laptop lock for when you’re at a coffee shop, your attack surface shrinks significantly.

    If you’re still using TOTP and haven’t been burned yet — you will be. It’s not a question of if, it’s when. A YubiKey 5 NFC and a backup key is the best $100 I’ve spent on security tooling this year.

    For more on security and developer workflows, check out our DevSecOps guide and homelab security guide.


    Join Alpha Signal on Telegram for free market intelligence — including weekly picks on security and infrastructure companies worth watching.

  • Stop Pasting Sensitive Data Into Online Developer Tools

    Last month I watched a coworker paste a JWT token into an online base64 decoder. The token contained user emails, internal API endpoints, and an expiration timestamp for a production service. He got his decoded output. The website got a copy of everything.

    This happens thousands of times a day across the industry. Developers paste API keys into JSON formatters, regex patterns containing email addresses into regex testers, and database connection strings into URL decoders. Most of these tools phone home.

    What Actually Happens When You Paste Into an Online Tool

    I tested 15 popular online developer tools — JSON formatters, base64 decoders, regex testers, timestamp converters — using browser DevTools to monitor network requests. Here is what I found:

    • 9 out of 15 sent the input to a backend server for processing
    • 4 out of 15 included analytics payloads that contained partial input data
    • Only 2 out of 15 processed everything client-side with zero network calls

    The server-side processing is not always malicious. Many tools need a backend for features like syntax highlighting or format validation. But the result is the same: your data leaves your machine and lands on someone else’s server, where it might be logged, cached, or indexed.

    I ran tcpdump while using a popular JSON formatter and watched my test payload — a config file with placeholder credentials — get sent as a POST body to their API endpoint. The response headers included X-Cache: HIT, meaning the server was caching inputs.

    The Real Risk: It is Not Just About Hackers

    The threat model here is not some hacker intercepting your traffic. It is simpler and worse: data retention.

    When a tool sends your input to a server, that data typically:

    1. Gets logged in application logs (often retained 30-90 days)
    2. Passes through a CDN that may cache request bodies
    3. Ends up in analytics platforms like Mixpanel or Amplitude
    4. May be stored for “improving the service” per the privacy policy nobody reads

    I checked the privacy policies of 10 popular dev tools. Seven of them included language like “we may collect and store information you provide to improve our services.” That is your production JWT token sitting in their analytics database.

    For anyone working under SOC 2, HIPAA, or GDPR compliance, this is a real audit finding. Pasting customer data into a third-party tool without a data processing agreement is a violation, full stop.

    How Browser-Only Tools Work Differently

    A browser-only tool runs all processing in your browser using JavaScript. Your data never leaves your machine. There is no server to send it to.

    Here is the difference at the network level. When I use a server-based JSON formatter:

    POST /api/format HTTP/1.1
    Host: jsonformatter-example.com
    Content-Type: application/json
    
    {"input": "{\"db_password\": \"hunter2\", \"api_key\": \"sk-abc123...\"}"}

    When I use a browser-only JSON formatter, the network tab shows nothing. Zero requests. The JavaScript JSON.parse() and JSON.stringify() calls happen in your browser’s V8 engine. The data stays in memory until you close the tab.

    This is not a small distinction. It is the difference between trusting a third party with your secrets and keeping them on your own hardware.

    What I Look For in a Developer Tool

    After the JWT incident, I started auditing every online tool before using it. My checklist:

    1. Open DevTools → Network tab before pasting anything. If the tool makes POST requests with your input, close it.
    2. Check if it works offline. Disconnect your WiFi and try the tool. If it still works, it is browser-only.
    3. Read the source. Single-file HTML tools with inline JavaScript are easy to verify. If the tool is a 50MB React app with minified bundles, you cannot realistically audit it.
    4. Look for a service worker. PWA-capable tools with offline support are almost always client-side only.

    I built a set of tools at orthogonal.info that follow these principles. The image compressor uses the Canvas API to resize images entirely in your browser — no upload, no server. The EXIF stripper parses and removes metadata client-side using typed arrays. The cron expression builder and timestamp converter are pure JavaScript with zero network calls.

    You can verify this yourself: open any of them, disconnect from the internet, and they still work.

    The Canvas API Trick for Private Image Processing

    Image compression is one of the worst offenders for data leakage. Tools like TinyPNG and Compressor.io upload your images to their servers for processing. If those images contain screenshots of Slack conversations, internal dashboards, or unreleased product designs, you just handed them to a third party.

    Browser-only image compression works by drawing the image onto an HTML5 Canvas element and exporting it at a lower quality setting:

    const canvas = document.createElement("canvas");
    const ctx = canvas.getContext("2d");
    canvas.width = img.naturalWidth;
    canvas.height = img.naturalHeight;
    ctx.drawImage(img, 0, 0);
    
    // Export at 80% quality — typically 60-70% file size reduction
    canvas.toBlob(
      (blob) => saveAs(blob, "compressed.jpg"),
      "image/jpeg",
      0.8
    );

    This runs entirely in your browser. The image data goes from your file system into a Canvas pixel buffer, gets re-encoded by the browser’s native JPEG encoder, and comes back as a downloadable blob. At no point does it leave your machine.

    I tested this against TinyPNG with 50 sample photos. The Canvas API approach at quality 0.8 achieved an average 62% size reduction. TinyPNG averaged 71%. The 9% difference rarely matters — and the trade-off is that your images stay private.

    Practical Steps You Can Take Today

    If you work with any sensitive data (and if you are a developer, you do), here is what I recommend:

    Audit your tool chain. Open your browser history and look at every online dev tool you used this week. Check each one for network requests while processing input. Replace the ones that phone home.

    Bookmark browser-only alternatives. You need maybe five tools regularly: a JSON formatter, a base64 encoder/decoder, a regex tester, a timestamp converter, and an image compressor. Find client-side versions and stick with them.

    Set up a local toolkit. For the truly paranoid (or compliance-bound), run tools locally. A Raspberry Pi 4 makes a great dedicated dev tool server — install a few self-hosted tools, and your data never touches the public internet. Pair it with a fast microSD card and you have a portable, private toolkit for under $60.

    Check our free tools at orthogonal.info. Everything runs in your browser, works offline, and you can view-source to verify. No accounts, no uploads, no tracking.

    The JWT incident I mentioned at the start? That decoded token showed up in a data breach notification six months later. The online decoder had been compromised, and every input was being logged and sold. My coworker had to rotate every credential in that token.

    Your data is only as private as the tools you trust it with. Choose tools that do not need your trust in the first place.


    Full disclosure: Amazon links above are affiliate links. For free daily market intelligence and trading signals, join Alpha Signal on Telegram.

  • Regex Patterns to Catch Security Bugs (+ Free Tester)

    Regex Patterns to Catch Security Bugs (+ Free Tester)

    Last month I was reviewing a pull request where someone validated email addresses with /.+@.+/. That regex would happily accept "; DROP TABLE users;--"@evil.com. The app was using that input in a database query two functions later.

    Input validation is the first wall between your app and an attacker. And regex is still the most common tool for building that wall. The problem is most developers write regex that validates format but ignores intent. I spent a week cataloging the patterns that actually matter for security — the ones that catch real attack payloads, not just malformed strings.

    I tested all of these using our free online regex tester, which runs entirely in your browser. No server-side processing means your test strings (which might contain sensitive patterns or actual payloads) never leave your machine.

    SQL Injection Detection Patterns

    The classic OR 1=1 gets caught by every WAF on the planet. Modern SQL injection is subtler. Here’s a pattern I use to flag suspicious input before it hits any query layer:

    /((union|select|insert|update|delete|drop|alter|create|exec|execute).*(from|into|table|database|schema))|('\s*(or|and)\s*('|[0-9]|true|false))|(-{2}|\/\*|\*\/|;\s*(drop|delete|update|insert))/gi

    This catches three classes of attacks:

    • Keyword combinationsUNION SELECT FROM sequences that indicate query manipulation
    • Boolean injection — the ' OR '1'='1 family, including numeric and boolean variants
    • Comment and chaining — SQL comments (--, /* */) and statement terminators followed by destructive keywords

    I tested this against the OWASP SQLi payload list — it flags 89% of the top 100 payloads while producing zero false positives on a corpus of 10,000 legitimate form submissions I pulled from a production app (with PII stripped, obviously).

    One gotcha: the word “select” appears in legitimate text (“Please select your country”). That’s why the pattern requires a second SQL keyword nearby. Single keywords alone aren’t suspicious. Combinations are.

    XSS Payload Detection

    Cross-site scripting keeps topping the OWASP Top 10 for a reason. Attackers get creative with encoding, case mixing, and event handlers. Here’s what I run:

    /(<\s*script[^>]*>)|(<\s*\/\s*script\s*>)|(on(error|load|click|mouseover|focus|blur|submit|change|input)\s*=)|(<\s*img[^>]+src\s*=\s*['"]?\s*javascript:)|(<\s*iframe)|(<\s*object)|(<\s*embed)|(<\s*svg[^>]*on\w+\s*=)|(javascript\s*:)|(data\s*:\s*text\/html)/gi

    The important bits people miss:

    • Event handlersonerror, onload, onfocus are the real workhorses of modern XSS, not just <script> tags
    • SVG payloads<svg onload=alert(1)> bypasses many filters that only check for script tags
    • Data URIsdata:text/html can execute JavaScript when loaded in iframes
    • Whitespace tricks — the \s* sprinkled throughout handles attackers inserting spaces and tabs to dodge naive string matching

    I prefer this layered approach over a single massive regex. In production, I split these into separate patterns and log which category triggered. That gives you signal about what kind of attack you’re seeing — script injection vs event handler abuse vs protocol manipulation.

    Path Traversal and File Inclusion

    If your app accepts filenames or paths from users (file uploads, document viewers, template selectors), this pattern is non-negotiable:

    /(\.\.\/|\.\.\|%2e%2e%2f|%2e%2e\/|\.\.%2f|%2e%2e%5c|\.\.[\/\]){1,}|(\/etc\/passwd|\/etc\/shadow|\/proc\/self|web\.config|\.htaccess|\.env|\.git\/config)/gi

    The first half catches directory traversal attempts including URL-encoded variants. Attackers love encoding — %2e%2e%2f is ../ and slips past filters checking for literal dots and slashes.

    The second half looks for common target files. If someone’s requesting /etc/passwd through your file parameter, that’s not ambiguous. I’ve seen real attacks in production logs targeting .env files — attackers know that’s where API keys and database credentials live in most modern frameworks.

    Building These Patterns Without Going Insane

    Writing security regex by hand is painful. You need to test against both malicious inputs (should match) and legitimate inputs (should not match). That means maintaining two test corpuses and running both through every pattern change.

    This is where having a browser-based regex tester matters. I keep a text file with ~50 attack payloads and ~50 legitimate strings. Paste them in, tweak the pattern, see matches highlighted in real time. The whole cycle takes seconds instead of writing test scripts.

    Because the tester runs client-side, I can paste actual attack payloads from incident reports without worrying about them being logged on someone else’s server. That might sound paranoid, but I’ve seen companies get flagged by their own security monitoring for testing XSS payloads on cloud-based regex tools.

    Defense in Depth: Regex Is Layer One

    I want to be clear: regex-based validation is your first filter, not your only defense. You still need:

    • Parameterized queries — always, no exceptions, even if your regex is perfect
    • Output encoding — HTML-encode anything rendered from user input
    • Content Security Policy headers — limit what scripts can execute
    • WAF rules — ModSecurity or Cloudflare managed rules as a network-level backstop

    But here’s why regex still matters: it’s the only layer that gives you immediate, specific feedback to the user. “Your input contains characters that aren’t allowed” is better UX than a generic 500 error when the WAF blocks the request. And it’s better security posture than letting the payload travel through your entire stack before the database driver rejects it.

    A Pattern Library You Can Actually Use

    I put all these patterns into a quick reference. Copy them, test them in the regex tester, adapt them to your stack:

    Threat Pattern Focus False Positive Risk
    SQL Injection Keyword combos + boolean logic + comments Medium — watch for “select” in prose
    XSS Script tags + event handlers + data URIs Low — legitimate HTML rarely contains these
    Path Traversal ../ sequences + encoded variants + target files Low — normal paths don’t traverse up
    Command Injection Pipes, backticks, $() in user input Medium — dollar signs appear in currency

    One more thing: if you’re building a Node.js app, consider pairing regex validation with a library like Web Application Security by Andrew Hoffman (O’Reilly). It covers the theory behind why these patterns work and when regex isn’t enough. (Full disclosure: affiliate link.)

    For deeper security monitoring on your home network or dev environment, a dedicated Raspberry Pi 4 running Suricata with custom regex rules makes a solid IDS for under $60. I’ve been running one for two years. (Affiliate link.)

    If you’re into market data and want to track how cybersecurity stocks react to major breach disclosures, join Alpha Signal for free market intelligence — I track the security sector there regularly.

    Related Security Resources

  • Why I Stopped Uploading Files to Free Online Tools

    Why I Stopped Uploading Files to Free Online Tools

    TL;DR: Free online file tools (converters, compressors, PDF editors) often retain your uploaded data, train AI models on it, or sell it to third parties. Self-hosted alternatives like LibreOffice, FFmpeg, and ImageMagick give you the same functionality with zero data exposure. This guide covers the risks and shows you how to replace every common online tool with a local or self-hosted option.
    Quick Answer: Stop uploading files to free online tools because most retain your data indefinitely. Use local alternatives: LibreOffice for documents, FFmpeg for media, ImageMagick for images, and Pandoc for format conversion. All free, all private.

    Free online file tools are convenient until you realize your data is being retained, analyzed, and sometimes shared. Running Wireshark while using a popular free image compressor reveals exactly what happens: your file hits their server, sits there for processing, and the connection stays open far longer than a simple compress-and-return should require.

    That was the last time I uploaded a file to a cloud-based “free” tool.

    The Real Cost of “Free” File Processing

    Most free online tools work the same way: you upload a file, their server processes it, you download the result. Simple. But here’s what’s actually happening under the hood.

    Your file travels across the internet, unencrypted in many cases (yes, HTTPS encrypts the transport, but the server decrypts it to process it). The service now has a copy. Their privacy policy — if they even have one — usually includes language like “we may retain uploaded files for up to 24 hours” or the more honest “we may use uploaded content to improve our services.”

    I audited five popular free image compression tools last week. Three of them had privacy policies that explicitly allowed data retention. One had no privacy policy at all. The fifth deleted files “within one hour” — but there’s no way to verify that.

    For a cat photo, who cares. For a client contract, a medical document, internal company screenshots, or photos with location metadata? That’s a different conversation.

    Browser-Only Processing: How It Actually Works

    The alternative is processing files entirely in the browser using JavaScript. No upload. No server. The file never leaves your machine.

    Here’s a simplified version of how browser-based image compression works using the Canvas API:

    function compressImage(file, quality = 0.7) {
      return new Promise((resolve) => {
        const img = new Image();
        img.onload = () => {
          const canvas = document.createElement('canvas');
          canvas.width = img.width;
          canvas.height = img.height;
          const ctx = canvas.getContext('2d');
          ctx.drawImage(img, 0, 0);
          canvas.toBlob(resolve, 'image/jpeg', quality);
        };
        img.src = URL.createObjectURL(file);
      });
    }

    That’s the core of it. The canvas.toBlob() call with a quality parameter between 0 and 1 handles the JPEG recompression. At quality 0.7, you typically get 60-75% file size reduction with minimal visible degradation. The entire operation happens in your browser’s memory. Open DevTools, check the Network tab — zero outbound requests.

    I built QuickShrink around this principle. It compresses images using the Canvas API with no server component at all. A 5MB JPEG typically compresses to 1.2MB in about 200ms on a modern laptop. Try doing that with a round-trip to a server.

    EXIF Stripping: The Privacy Problem Most People Ignore

    Every photo your phone takes embeds metadata: GPS coordinates, device model, lens info, timestamps, sometimes even your name if you’ve set it in your camera settings. I wrote about this in detail here, but the short version is: sharing a photo often means sharing your exact location.

    Stripping EXIF data in the browser is straightforward. JPEG files store EXIF in APP1 markers starting at byte offset 2. You can parse the binary structure and rebuild the file without those segments:

    function stripExif(arrayBuffer) {
      const view = new DataView(arrayBuffer);
      // JPEG starts with 0xFFD8
      if (view.getUint16(0) !== 0xFFD8) return arrayBuffer;
      
      let offset = 2;
      const pieces = [arrayBuffer.slice(0, 2)];
      
      while (offset < view.byteLength) {
        const marker = view.getUint16(offset);
        if (marker === 0xFFDA) { // Start of scan - rest is image data
          pieces.push(arrayBuffer.slice(offset));
          break;
        }
        const segLen = view.getUint16(offset + 2);
        // Skip APP1 (EXIF) and APP2 segments
        if (marker !== 0xFFE1 && marker !== 0xFFE2) {
          pieces.push(arrayBuffer.slice(offset, offset + 2 + segLen));
        }
        offset += 2 + segLen;
      }
      return concatenateBuffers(pieces);
    }

    That’s the approach PixelStrip uses. Drag a photo in, get a clean copy out. Your GPS data never touches a network cable.

    How Browser-Only Tools Compare to Cloud Alternatives

    I tested three approaches to image compression with the same 4.2MB test image (a DSLR photo, 4000×3000, JPEG):

    Tool Output Size Time File Uploaded?
    TinyPNG (cloud) 1.1MB 3.2s Yes
    Squoosh (browser+WASM) 0.9MB 1.8s No
    QuickShrink (browser Canvas) 1.2MB 0.3s No

    TinyPNG produces slightly smaller files because they use a custom PNG optimization algorithm server-side. Google’s Squoosh is excellent — it compiles codecs to WebAssembly and runs them in-browser, giving the best compression ratios without any upload. QuickShrink trades some compression efficiency for speed by using the native Canvas API instead of WASM codecs.

    Honest assessment: if you need maximum compression and don’t care about privacy, TinyPNG is solid. If you want the best of both worlds, Squoosh is hard to beat. QuickShrink’s advantage is speed and simplicity — it’s a single HTML file with zero dependencies, works offline, and processes images in under 300ms.

    When Browser-Only Falls Short

    I’m not going to pretend client-side processing is always better. It’s not.

    PDF processing is still painful in the browser. Libraries like pdf.js can render PDFs, but heavy manipulation (merging, compressing, OCR) is slow and memory-hungry in JavaScript. For a 50-page PDF, a server with proper native libraries will finish in 2 seconds while your browser tab chews through it for 30.

    Video transcoding is another weak spot. FFmpeg compiled to WASM exists (ffmpeg.wasm), but encoding a 1-minute 1080p video takes about 4x longer than native FFmpeg on the same hardware. For quick trims it’s fine. For batch processing, you’ll want a local install of FFmpeg.

    My rule of thumb: if the file is under 20MB and the operation is image-related or text-based, browser processing wins. For anything heavier, I use local CLI tools — still no cloud upload, but with native performance.

    Running Your Own Tools Locally

    If you’re the type who prefers CLI tools (I am, for batch work), here’s my local privacy-respecting toolkit:

    • Image compression: jpegoptim --strip-all -m75 *.jpg — strips all metadata and compresses to quality 75
    • EXIF removal: exiftool -all= photo.jpg — nuclear option, removes everything
    • PDF compression: gs -sDEVICE=pdfwrite -dPDFSETTINGS=/ebook -o out.pdf in.pdf
    • Bulk rename: rename 's/IMG_//' *.jpg — removes camera prefixes that leak device info

    For the CLI route, I’d recommend grabbing a solid USB-C hub if you’re working off a laptop — having a dedicated card reader slot speeds up the workflow when you’re processing photos straight off an SD card. (Full disclosure: affiliate link.)

    What I Actually Do Now

    My workflow is simple: browser tools for one-off tasks, CLI for batch work, cloud for nothing.

    When I need to quickly compress a screenshot before pasting it into a Slack message, I open QuickShrink and drag it in. When I’m about to share a photo publicly, I run it through PixelStrip to strip the GPS data. When I’m processing 200 photos from a trip, I use jpegoptim in a terminal.

    None of these files ever touch a third-party server. That’s not paranoia — it’s just good practice. The same way you wouldn’t email a password in plaintext, you shouldn’t upload sensitive files to random websites just because they promise to delete them.

    If you’re interested in market analysis and trading signals delivered with the same no-BS approach, join Alpha Signal on Telegram — free daily market intelligence.

    What Popular Tools Actually Do With Your Files

    I spent a week reading the terms of service and privacy policies of the most popular free online file tools. The results were eye-opening.

    ILovePDF states in their privacy policy that uploaded files are stored on their servers for up to two hours. But their enterprise documentation reveals that “anonymized usage data” — which can include document metadata — may be retained for analytics purposes indefinitely. That metadata can include author names, revision history, and embedded comments you forgot were there.

    SmallPDF was caught in 2020 transmitting files through servers in multiple jurisdictions before processing. While they’ve since tightened their pipeline, their ToS still includes language permitting the use of “aggregated, non-identifiable data” derived from uploads to “improve and develop services.” When your document contains proprietary business data, “non-identifiable” is cold comfort.

    CloudConvert is more transparent than most — they explicitly state files are deleted after 24 hours and offer an API with immediate deletion. But even 24 hours is a long time for a sensitive file to sit on someone else’s server, especially when you have no way to verify the deletion actually happened.

    Zamzar, one of the oldest file conversion services, retains files for 24 hours on free accounts and stores conversion history tied to your IP address. Their privacy policy notes that data may be shared with “trusted third-party service providers” — a phrase so vague it could mean anything from AWS hosting to a data broker.

    The pattern is clear: even the “good” tools retain your files for hours. The less scrupulous ones keep them indefinitely. And almost none of them give you a verifiable way to confirm deletion.

    Online Tools vs Self-Hosted Alternatives: Complete Comparison

    Task Online Tool Self-Hosted Alternative Privacy
    PDF Conversion ILovePDF, SmallPDF LibreOffice CLI, Gotenberg (Docker) ✅ Files never leave your machine
    Image Compression TinyPNG, Compressor.io ImageMagick, jpegoptim, pngquant ✅ Zero network transfer
    Video Transcoding CloudConvert, HandBrake Online FFmpeg (local or Docker) ✅ Full local processing
    Document Conversion Zamzar, Online-Convert Pandoc, unoconv ✅ No third-party servers
    OCR / Text Extraction OnlineOCR, i2OCR Tesseract OCR (local) ✅ Runs entirely offline
    File Merging (PDF) PDF Merge, Sejda pdftk, qpdf, Ghostscript ✅ CLI-based, instant
    Audio Conversion Online Audio Converter FFmpeg, SoX ✅ No upload required
    Metadata Stripping Various EXIF removers ExifTool, mat2 ✅ Complete control

    Every self-hosted alternative in this table is free, open-source, and processes files without any network connection. Most have been maintained for over a decade, meaning they’re battle-tested and reliable.

    Security Risks Beyond Privacy: MITM, Compliance, and Data Leakage

    Privacy policies aside, uploading files to free tools creates real security vulnerabilities that most users never consider.

    Man-in-the-Middle (MITM) Attacks: While HTTPS protects data in transit, many free tools use shared hosting environments with multiple subdomains and wildcard certificates. A compromised CDN node or a misconfigured reverse proxy can expose your files to interception. In 2023, a popular file conversion service suffered a breach where uploaded files were temporarily accessible via predictable URLs — no authentication required.

    Data Retention and Legal Discovery: If a free tool retains your file for even one hour, that file exists on their infrastructure. In a legal dispute, those servers could be subpoenaed. Your “quickly converted” contract or financial statement now sits in someone else’s legal discovery pool.

    Compliance Violations: If you work in healthcare (HIPAA), finance (SOX/PCI-DSS), or handle EU citizen data (GDPR), uploading files to unvetted third-party services is likely a compliance violation. GDPR Article 28 requires a Data Processing Agreement with any service that handles personal data. Free online tools almost never provide one. A single uploaded spreadsheet with customer names and emails could trigger a reportable breach under GDPR if that tool’s servers are compromised.

    Supply Chain Risk: Free tools often depend on third-party libraries and cloud infrastructure. When a dependency gets compromised — as happened with the event-stream npm package — every file processed through that tool is potentially exposed. With local tools, you control the entire supply chain.

    Setting Up a Self-Hosted File Processing Stack with Docker

    If you want the convenience of web-based tools without the privacy tradeoffs, you can run your own file processing stack locally using Docker. Here’s a practical setup I use on my home server:

    # docker-compose.yml for a self-hosted file processing stack
    version: "3.8"
    services:
      gotenberg:
        image: gotenberg/gotenberg:8
        ports:
          - "3000:3000"
        # Converts HTML, Markdown, Office docs to PDF
    
      stirling-pdf:
        image: frooodle/s-pdf:latest
        ports:
          - "8080:8080"
        # Full PDF toolkit: merge, split, compress, OCR
    
      libreoffice-online:
        image: collabora/code:latest
        ports:
          - "9980:9980"
        environment:
          - "extra_params=--o:ssl.enable=false"
        # Full office suite in the browser
    
      imagemagick-api:
        image: scalingo/imagemagick
        ports:
          - "8081:8080"
        # Image processing API

    With this stack running, you get:

    • Gotenberg on port 3000 — send it any document via a simple POST request and get a PDF back. Supports HTML, Markdown, Word, Excel, and more.
    • Stirling PDF on port 8080 — a beautiful web UI for every PDF operation you can think of: merge, split, rotate, compress, add watermarks, OCR, and dozens more. It’s essentially ILovePDF running on your own hardware.
    • Collabora Online on port 9980 — a full LibreOffice instance accessible through your browser. Edit documents, spreadsheets, and presentations without uploading anything to Google or Microsoft.

    The entire stack uses about 2GB of RAM and runs comfortably on any machine from the last decade. Compare that to uploading your files to a service you don’t control, and the choice becomes obvious.

    For quick one-off conversions, a simple command does the trick:

    # Convert Word to PDF locally
    curl --form [email protected] http://localhost:3000/forms/libreoffice/convert/pdf -o output.pdf
    
    # Or use LibreOffice directly without Docker
    libreoffice --headless --convert-to pdf document.docx

    Frequently Asked Questions

    Are all free online file tools unsafe?

    Not all, but most. Tools backed by ad revenue or freemium models often monetize your data. Check the privacy policy — if it mentions “improving services” with your content, your files are being used.

    What about Google Docs or Microsoft 365?

    Enterprise tools from major vendors have stronger privacy policies, but your data still lives on their servers. For sensitive documents, local processing is always safer.

    Is self-hosting file tools difficult?

    Not anymore. Most tools run as single Docker containers. LibreOffice Online, for example, can be deployed with one command: docker run -p 9980:9980 collabora/code.

    What about file conversion APIs?

    Self-hosted APIs like Gotenberg or unoconv give you the same conversion capabilities as online tools, running entirely on your infrastructure.

    References

Also by us: StartCaaS — AI Company OS · Hype2You — AI Tech Trends