Category: Tools & Setup

Tools & Setup is where orthogonal.info curates practical, battle-tested guides on developer productivity tools, CLI utilities, self-hosted software, and environment configuration. Whether you are bootstrapping a new development machine, evaluating self-hosted alternatives to SaaS products, or fine-tuning your terminal workflow, this category delivers step-by-step walkthroughs grounded in real-world experience. Every article is written with one goal: help you build a faster, more reliable, and more enjoyable development environment.

With over 25 in-depth posts and growing, Tools & Setup is one of the most active categories on the site — reflecting just how much time engineers spend (and save) by getting their tooling right from day one.

Key Topics Covered

Command-line productivity — Shell customization (Zsh, Fish, Starship), terminal multiplexers (tmux, Zellij), and CLI utilities like ripgrep, fd, fzf, and bat that supercharge daily workflows.
Self-hosted alternatives — Deploying and configuring tools like Gitea, Nextcloud, Vaultwarden, and Uptime Kuma so you own your data without sacrificing usability.
IDE and editor setup — Configuration guides for VS Code, Neovim, and JetBrains IDEs, including extension recommendations, keybindings, and remote development workflows.
Development environment automation — Using Ansible, Homebrew, Nix, dotfiles repositories, and container-based dev environments (Dev Containers, Devbox) to make setups reproducible.
Git workflows and tooling — Advanced Git techniques, hooks, aliases, and GUI clients that streamline version control for solo developers and teams alike.
API testing and debugging — Hands-on guides for curl, HTTPie, Postman, and browser DevTools to debug REST and GraphQL APIs efficiently.
Package and runtime management — Managing multiple language runtimes with asdf, mise, nvm, and pyenv, plus dependency management best practices.

Who This Content Is For
This category is designed for software engineers, DevOps practitioners, system administrators, and hobbyist developers who want to work smarter, not harder. Whether you are a junior developer setting up your first Linux workstation or a senior engineer optimizing a multi-machine workflow, you will find actionable advice that respects your time. The guides assume basic command-line comfort but explain advanced concepts clearly.

What You Will Learn
By exploring the articles in Tools & Setup, you will learn how to automate repetitive environment tasks so a fresh machine is productive in minutes, not days. You will discover modern CLI replacements for legacy Unix tools, understand how to evaluate self-hosted software against its SaaS equivalent, and gain confidence configuring complex development stacks. Each guide includes copy-paste commands, configuration snippets, and links to upstream documentation so you can adapt the advice to your own infrastructure.

Start browsing below to find your next productivity upgrade.

  • How One Regex Took Down Cloudflare: Catastrophic Backtracking, Tested in Your Browser

    A single line of validation code once took down half of Cloudflare. On July 2, 2019, a regular expression pushed to their WAF spiked CPU to 100% across their global network and knocked a chunk of the internet offline for about 27 minutes. The regex looked harmless. It contained a pattern that backtracks exponentially, and one crafted request was enough to melt a core.

    I keep RegexLab open in a pinned tab specifically to catch this class of bug before it ships. It’s a browser-only regex tester, which matters here for a reason most people miss: when you’re testing a pattern that can hang for 13 seconds, you really don’t want that pattern running on someone else’s server. In RegexLab it runs on your machine, in the same V8 engine your Node backend uses, so the timing you see is the timing you’ll get in production.

    What catastrophic backtracking actually is

    Most regex engines (JavaScript, Python’s re, Java, PCRE) use backtracking. When a pattern can match the same input more than one way, the engine tries one path, and if that fails, it walks back and tries another. Usually that’s fine. The problem starts when the number of possible paths grows exponentially with input length.

    The textbook example is a quantifier inside a quantifier:

    /^(a+)+$/

    Feed it a string of a characters followed by one b. The b guarantees the match fails at the end, but before the engine gives up, it tries every way to split those as between the inner a+ and the outer +. That’s 2^n splits. Each extra character doubles the work.

    I benchmarked it on my machine (Node 24, plain V8 — same engine Chrome runs) with /^(a+)+$/ against n copies of “a” plus a trailing “b”:

    n=15   0.39 ms
    n=20   10.6 ms
    n=22   42.3 ms
    n=24   169 ms
    n=26   585 ms
    n=28   2,516 ms

    Read that again. Going from 26 to 28 characters — two bytes — took the match from half a second to two and a half seconds. At n=32 you’re looking at ~40 seconds for one call. An attacker doesn’t need a botnet. They need one text field and a 32-character string.

    The version that bites real apps

    Nobody writes /^(a+)+$/ on purpose. The dangerous ones look reasonable. Here’s a pattern shaped like a thousand email and username validators I’ve seen in the wild:

    /^([a-zA-Z0-9]+)*@/

    Looks like it’s checking for alphanumeric characters before an @. It also has a + nested inside a *, which is the same exponential trap wearing a nicer suit. I fed it a long run of letters with no @ so the match fails:

    n=20   13 ms
    n=25   434 ms
    n=28   3,303 ms
    n=30   13,225 ms

    Thirty characters. Thirteen seconds. If that regex sits on a login or signup endpoint, one request ties up a worker for 13 seconds. Send twenty of them and your event loop is done. This is a denial-of-service that ships as “input validation.”

    Spotting it before it ships

    The tell is any place where two quantifiers can fight over the same characters. Watch for these shapes:

    • (x+)+ — nested quantifiers, the classic
    • (x*)* — same idea
    • (x+)* and (x*)+ — mixed, still exponential
    • (a|a)+ or (a|ab)+ — alternation where branches overlap
    • (\s+)+, (\w+)* — the real-world disguises

    The fix is almost always to remove the redundancy. The outer quantifier in ([a-zA-Z0-9]+)* does nothing the inner one can’t — [a-zA-Z0-9]+ already matches one or more characters. Drop the wrapper:

    /^[a-zA-Z0-9]+@/

    I reran the safe rewrite /^a+$/ against inputs up to 100,000 characters:

    n=28       0.066 ms
    n=1,000    0.010 ms
    n=100,000  0.216 ms

    Linear time. A hundred thousand characters finishes faster than the evil version handles 20. That’s the whole point — the pattern that looks more permissive is thousands of times faster because there’s only one way to match.

    Why I test these in the browser, not on a server

    Here’s the part that ties back to how RegexLab is built. To confirm a regex is vulnerable, you have to actually run it against a malicious input and watch it hang. If you do that in an online tester that processes patterns server-side, you’re either (a) DoS-ing their box, which is rude, or (b) hitting a timeout that hides the problem from you.

    RegexLab runs the match with the native RegExp engine right in your tab. Nothing is uploaded. Your patterns — which for a lot of us encode business logic, internal formats, sometimes secrets baked into validation rules — never leave the machine. When a match hangs, it hangs your tab, which is exactly the feedback you want. You can feel the 3-second pause and know you found something. I wrote more about why I stopped trusting server-side dev tools with sensitive input in this piece on pasting data into online tools.

    My workflow in RegexLab is simple:

    1. Paste the pattern.
    2. Add a normal test case — confirm it matches what it should.
    3. Add an evil case: a long run of the character class the pattern repeats, ending with something that forces a failed match.
    4. If the result is instant, you’re probably fine. If the tab stalls, you have a ReDoS.

    The multi-case runner is handy here because you can keep the “good” input and the “attack” input side by side and re-run both after every edit to the pattern. I keep a small library of attack strings — 30 identical chars plus a mismatch — for exactly this. For a broader take on using the tool for security work, I wrote up regex patterns that catch real security bugs.

    The structural fixes worth knowing

    Rewriting to remove nested quantifiers covers most cases, but two other tools help:

    Atomic groups and possessive quantifiers. These tell the engine “match this and never give it back,” which kills the backtracking. JavaScript didn’t support them for years, but modern V8 (Node 18+ / recent Chrome) does via (?>...) and a++. So /^(?>a+)+$/ won’t blow up. Check your runtime before relying on it — if you’re on an older engine it’ll throw a syntax error.

    Switch engines for untrusted input. Rust’s regex crate and Go’s RE2 use a finite-automaton approach with no backtracking at all, so ReDoS is impossible by construction. The tradeoff is they drop backreferences and lookaround. If you’re validating user input at scale, that tradeoff is usually worth it. Google built RE2 for exactly this reason after backtracking engines kept taking down services.

    If you want to go deep on how these engines actually differ, Jeffrey Friedl’s Mastering Regular Expressions is still the reference — it’s the book that made the backtracking-vs-automaton distinction click for me (affiliate link, full disclosure). Russ Cox’s free regexp article series covers the RE2 side if you prefer the theory online.

    Test the regex you shipped last month

    Pull up your codebase and grep for )+, )*, and any validation regex on an input field. Drop each one into RegexLab, hand it 30 repeated characters ending in a mismatch, and watch the clock. It takes about ten seconds per pattern, it runs entirely in your browser, and it might save you from a 2 AM page when someone finds the same field an attacker would.

    The Cloudflare outage cost real money and made global news. The bug was one unbounded quantifier next to another. That’s a five-second check you can run right now.


    Join https://t.me/alphasignal822 for free market intelligence.

  • I Stopped Pasting JWTs Into Online Base64 Decoders — Here’s the Browser-Only Fix

    Last month I watched a teammate debug an auth bug by pasting a production JWT into the first “base64 decode online” result on Google. The token was a live bearer credential — valid for another 50 minutes, signed for our payments service. He pasted it into a text box on a server he’d never heard of, hit decode, and read the payload. The bug got fixed. The token also got handed to a stranger’s web server, where it sat in request logs that neither of us will ever see.

    That’s the quiet problem with online base64 tools, and it’s why I keep pointing people at Base64Lab instead. It does the same decode, except the bytes never leave the tab. No upload, no round trip, no log entry on someone else’s box. Below is what actually happens under the hood, why the “URL-safe” toggle matters more than people think, and where the browser’s built-in tools fall on their face.

    Why pasting a JWT into a random decoder is a credential leak

    A JWT is three base64url segments joined by dots: header, payload, signature. The first two decode to plain JSON. The third is the HMAC or RSA signature. Decoding it doesn’t “crack” anything — but the point is the whole string is the credential. If your decoder runs server-side, you just POSTed a working bearer token to a third party.

    Most “free online” decoders are server-side. You can tell because they work even with JavaScript disabled, or because the network tab shows a request firing on every keystroke. Some are honest hobby projects. Some are ad-funded and log everything. You have no way to know which, and “it’s probably fine” is not a security model when the input is a live session token, an API key in a config blob, or a base64-encoded `.env` file.

    Base64Lab is the opposite by construction. Open the network tab, decode a 2 MB file, and you’ll see exactly zero requests carrying your data. The only ping it makes is a one-pixel image hit to a counter endpoint — tool name plus a timestamp, no input, no payload. Everything else is `atob`, `btoa`, and a `TextDecoder`, running in your tab.

    The URL-safe gotcha that breaks the browser console

    Here’s the part that trips up even experienced devs. You might think “I don’t need a tool, I’ll just run `atob()` in the console.” Try it on a real JWT payload and watch it throw.

    // A JWT payload segment is base64URL, not standard base64
    atob("eyJzdWIiOiIxMjM0NTY3ODkwIn0")
    // Works here, but feed it bytes that encode to + or /
    // and the url-safe variant uses - and _ instead:
    atob("-_-_Pj_4")
    // Uncaught DOMException: Failed to execute 'atob':
    // The string to be decoded is not correctly encoded.

    Base64url swaps two characters from the standard alphabet: + becomes -, / becomes _, and trailing = padding is usually dropped. The browser’s `atob` only understands the standard alphabet with correct padding, so it rejects exactly the strings you most often need to decode — JWTs, OAuth state params, anything that travels in a URL.

    The fix is a normalization step the tool does for you on every decode:

    function decode(str) {
      let n = str.replace(/-/g, '+').replace(/_/g, '/').replace(/\s/g, '');
      while (n.length % 4 !== 0) n += '=';   // re-add stripped padding
      const raw = atob(n);
      try { return decodeURIComponent(escape(raw)); } // UTF-8 aware
      catch { return raw; }                            // fall back to raw bytes
    }

    I tested this against the standard JWT from jwt.io. The header decodes to {"alg":"HS256","typ":"JWT"} and the payload to {"sub":"1234567890","name":"John Doe","admin":true,"iat":1516239022} — and the same input throws an `Invalid character` exception through bare `atob`. That `replace`/repad dance is the whole reason a dedicated tool beats the console.

    The UTF-8 trap, and the emoji that proves it

    The second thing naive decoders get wrong is multi-byte text. `atob` hands you a binary string where each character is one byte. If the original was UTF-8 — anything with an accent, a CJK character, or an emoji — you need to reassemble those bytes back into code points. Skip that step and “café” comes back as “café”.

    The decodeURIComponent(escape(raw)) trick handles it: `escape` percent-encodes each byte, then `decodeURIComponent` reads those percent groups as UTF-8. Encoding runs the mirror image with btoa(unescape(encodeURIComponent(data))). It’s an old idiom, but it round-trips correctly, and the `try/catch` means raw binary that isn’t valid UTF-8 falls through untouched instead of corrupting silently. I checked a string of emoji through encode then decode — byte-identical out the other side.

    Where it beats the command line too

    I live in a terminal, so I’ll be honest about when `base64 -d` is the right call: scripting, pipes, CI. But three things push me back to the browser tab more often than I expected.

    • It auto-detects direction. Paste base64, it decodes; paste plain text, it encodes. No flipping a -d flag and re-running.
    • Per-line mode. Got a file of base64 strings, one per line? Toggle per-line processing and each row decodes independently instead of the whole blob being treated as one stream. macOS `base64` won’t do that without a `while read` loop.
    • It previews images. Paste a data:image/png;base64,... URI and it renders the actual image, which is the fastest way I know to sanity-check an inline asset.

    And because it’s a PWA with a service worker, it works offline. Load it once, kill your wifi, and it still decodes — which is exactly the posture you want for a tool that touches secrets. I’ve written before about why I stopped uploading files to free online tools; this is the same principle applied to text.

    The honest limitation

    Base64 is encoding, not encryption. Decoding a JWT shows you the claims; it does not verify the signature or let you forge one. If you need to validate signatures or test signing keys, that’s a different job — reach for a proper JWT library, not a base64 tool. Base64Lab’s lane is fast, private, correct decode/encode of text and files. It stays in that lane on purpose.

    If you handle tokens and config blobs all day, a mechanical keyboard with proper n-key rollover genuinely cuts down on the typo-induced “why won’t this decode” rabbit holes — I use a Keychron K2 mechanical keyboard (full disclosure: affiliate link) and the tactile feedback alone has saved me from more than one mispasted credential. For the security-minded, a YubiKey 5 hardware key (affiliate link) is the right answer for the auth flows those JWTs come from in the first place.

    Try the tool here: Base64Lab. If you want more like it, HashForge does the same browser-only treatment for hashing, and RegexLab for regex testing — all of them in the free tools collection.


    Join https://t.me/alphasignal822 for free market intelligence.

  • The SpaceX 424B Prospectus Is Free on SEC EDGAR — Here’s What It Says and How to Pull It

    The day SpaceX priced its IPO, half the finance Twitter accounts I follow linked to a paywalled news story. The other half linked to a screenshot of a screenshot. Almost nobody linked to the one document that actually mattered: the SpaceX 424B prospectus sitting on SEC EDGAR, free, with every number you could want. So here’s the filing, the terms straight off the cover page, and a 20-line Python script that pulls the document URL for any company without you clicking through EDGAR’s 1990s interface.

    The final prospectus — the Form 424B4 — was filed on June 12, 2026 under accession number 0001628280-26-042639. If you just want to read it, here’s the direct link to the document on SEC EDGAR:

    SpaceX 424B4 final prospectus (sec.gov)

    Fair warning before you click: that HTML file is about 11.9 MB because the prospectus is stuffed with full-page photos of Starship and Falcon boosters. Your browser will chew on it for a second.

    What a 424B actually is (and why it’s the one you want)

    People search for “424B” without always knowing why it’s different from the S-1 everyone talks about. The short version:

    • S-1 is the registration statement a company files to start the IPO process. SpaceX filed its original S-1 on May 20, 2026, then amended it twice (S-1/A on June 1 and June 3) as the SEC and the market pushed back on the draft.
    • 424B4 is the final prospectus, filed after pricing under Rule 424(b)(4). This is the one with the real numbers — the actual offering price, the exact share count, the underwriting discount. The S-1 has blanks where those go. The 424B fills them in.

    So when you want the truth about what a deal priced at, the 424B is the document. The S-1 tells you what the company hoped for. I learned this the annoying way years ago, quoting a price range from an S-1 that turned out to be 20% off the final price.

    The numbers off the SpaceX cover page

    Everything below is lifted straight from the cover of the 424B4. No analyst spin, just what the filing says:

    • Shares offered: 555,555,555 shares of Class A common stock
    • IPO price: $135.00 per share
    • Gross raise: $74,999,999,925 — call it $75 billion
    • Ticker: SPCX on Nasdaq (and Nasdaq Texas)
    • Underwriting discount: $0.90 per share, or $500,000,000 total
    • Net proceeds to SpaceX: $134.10 per share, about $74.5 billion before expenses
    • Settlement: shares ready for delivery on or about June 15, 2026

    A $75 billion raise is not a normal IPO. For scale, that’s larger than the entire 2025 US IPO market combined in most tallies. The lead underwriters are the usual heavyweight syndicate — Goldman Sachs, Morgan Stanley, BofA Securities, Citigroup, J.P. Morgan, Barclays, and a long tail behind them.

    The detail that matters more than the price: voting control

    If you only read the cover, you’d miss the part that actually governs this company. SpaceX went public with a dual-class structure:

    • Class A (the shares you can buy): 1 vote per share
    • Class B (insider shares): 10 votes per share

    The prospectus states that immediately after the offering, Elon Musk will hold approximately 82.4% of the voting power — roughly 82.3% even if the underwriters exercise their over-allotment option in full. You are buying economic exposure to SpaceX, not a say in how it’s run. That’s not a knock; it’s just a fact the filing spells out, and it’s exactly the kind of thing buried 40 paragraphs deep that retail buyers skip. Read the risk factors before the photos.

    On use of proceeds, the filing is specific for once: fund the growth strategy including expansion of AI compute infrastructure, launch infrastructure and vehicles, scaling the satellite constellations, and general corporate purposes. The AI compute line is the new tell — this is no longer just a rockets-and-Starlink story.

    Pull the filing yourself with 20 lines of Python

    Clicking through EDGAR by hand is fine once. If you track filings regularly, automate it. SEC publishes a clean JSON endpoint for every company’s filing history — no scraping, no API key. The only rule: you must send a descriptive User-Agent header with contact info, or EDGAR returns a 403 throttle page instead of data. I left out a real UA on my first try and spent ten minutes confused by an “Undeclared Automated Tool” message.

    This uses only the Python standard library — no requests, no pip install:

    import json, urllib.request
    
    # SEC requires a descriptive User-Agent or it returns a 403 throttle page.
    UA = {"User-Agent": "Jane Dev [email protected]"}
    CIK = 1181412  # SpaceX (SPCX)
    
    def get_json(url):
        req = urllib.request.Request(url, headers=UA)
        with urllib.request.urlopen(req, timeout=30) as r:
            return json.load(r)
    
    # 1) Full filing history, newest first
    sub = get_json(f"https://data.sec.gov/submissions/CIK{CIK:010d}.json")
    rec = sub["filings"]["recent"]
    
    # 2) Walk the parallel arrays, grab the 424B4 (the final prospectus)
    for form, date, acc, doc in zip(
            rec["form"], rec["filingDate"],
            rec["accessionNumber"], rec["primaryDocument"]):
        if form == "424B4":
            folder = acc.replace("-", "")
            print(f"{form}  filed {date}")
            print(f"https://www.sec.gov/Archives/edgar/data/{CIK}/{folder}/{doc}")
            break

    Run it and you get:

    424B4  filed 2026-06-12
    https://www.sec.gov/Archives/edgar/data/1181412/000162828026042639/spaceexplorationtechnologi.htm

    The structure is worth understanding because it generalizes. The submissions endpoint returns filings as parallel arraysform[i], filingDate[i], and accessionNumber[i] all line up by index. Zip them together and filter on whatever form type you care about: 10-K for annual reports, 8-K for material events, SC 13D for activist stakes. Change the CIK and the same script works for any filer.

    Finding a company’s CIK is the one manual step. Search the company name at EDGAR company search, or hit the full-text search API directly — I wrote a separate teardown of EDGAR’s full-text search endpoint (efts.sec.gov) if you want to find filings by keyword instead of CIK.

    One gotcha: the throttle and the rate limit

    Two things will bite you if you scale this up. First, the User-Agent rule above — non-negotiable. Second, SEC asks you to stay under 10 requests per second. For pulling one filing that’s irrelevant, but if you loop over a watchlist of 200 tickers, add a small time.sleep(0.15) between calls. Get greedy and your IP eats a temporary block. The data is free; the courtesy is the price.

    If you’d rather not hit EDGAR at all and just want pre-IPO valuation context before deals like this hit the tape, I covered tracking pre-IPO valuations for SpaceX, OpenAI and Anthropic with a free API in an earlier post.

    If you’d rather read filings on paper

    I read short filings on screen, but for a 200-page prospectus I print the risk factors and use of proceeds sections and mark them up. A cheap monochrome laser printer pays for itself fast if you do this often — the Brother HL-L2350DW is the one sitting next to my desk, and for marking up dense documents a basic set of highlighters beats squinting at a tablet. Full disclosure: those are Amazon affiliate links — they help keep this blog running and cost you nothing extra.

    That’s the whole thing. The SpaceX 424B prospectus is public, the terms are a $135 IPO price on 555.5M shares for a ~$75B raise, and you can pull any company’s filing URL with standard-library Python in under a second. Stop trusting screenshots. Go to the source.

    If you came here for the primary-source habit, the same logic applies to Congress: pull the latest House stock trades yourself straight from the Clerk of the House instead of a dead aggregator.


    Join https://t.me/alphasignal822 for free market intelligence.

  • Why the Web Crypto API Won’t Compute MD5 (and How HashForge Does It in Your Browser)

    Last week I needed an MD5 checksum to verify a file against a vendor’s published manifest. Old habit kicked in: open devtools, reach for the Web Crypto API, type one line. It failed on the spot:

    await crypto.subtle.digest('MD5', new TextEncoder().encode('abc'))
    // DOMException: Algorithm: Unrecognized name MD5

    No MD5. Not deprecated-with-a-warning — just absent, like it was never on the menu. That single rejection is the whole reason HashForge, the in-browser hash generator I keep bookmarked, ships its own MD5 routine instead of asking the browser. Here’s why the browser says no, and how HashForge works around it without uploading your file anywhere.

    The Web Crypto API blocks MD5 on purpose

    The digest side of the Web Crypto API supports exactly four algorithms: SHA-1, SHA-256, SHA-384, and SHA-512. That list is fixed in the W3C spec. MD5 isn’t missing because nobody filed a ticket — the working group left it out, along with MD4, because shipping a broken hash through an API named “crypto” invites people to misuse it.

    MD5 has had practical collision attacks since 2004, when Wang and Yu produced two different inputs with the same digest by hand-tuning the message. By 2008 researchers used MD5 collisions to forge a rogue CA certificate. The hash is finished for anything where an attacker controls the input.

    Here’s the part I find funny: the browser still lets you compute SHA-1, which Google and CWI fully collided in 2017 with the SHAttered attack. SHA-1 stayed in the spec for backward compatibility with existing protocols. MD5 never made the cut at all. The vendors drew a line, and MD5 landed on the wrong side of it.

    I agree with that call for new code. The catch is that the rest of us still bump into MD5 constantly, and almost never for security:

    • Vendor downloads still publish an MD5 next to the file
    • S3 ETags are the MD5 of the object for single-part uploads
    • Legacy rows store md5(email) for Gravatar-style lookups
    • Plenty of internal tools fingerprint content with MD5 because it’s fast and short

    So you hit a wall. The data is MD5, the browser refuses to compute MD5, and you would rather not paste a confidential file into some random “free MD5 online” site that ships it off to a server you’ve never audited.

    How HashForge fills the gap

    HashForge splits the work in two. For the SHA family it calls the native API — fast, audited, hardware-accelerated on most machines:

    const ALGOS = ['MD5','SHA-1','SHA-256','SHA-384','SHA-512'];
    
    async function hashText(text, algos, enc='hex'){
      const encoded = new TextEncoder().encode(text);
      const out = {};
      for (const algo of algos){
        if (algo === 'MD5'){
          out[algo] = formatHash(md5(encoded.buffer), enc);     // pure JS
        } else {
          const hash = await crypto.subtle.digest(algo, encoded); // native
          out[algo] = formatHash(hash, enc);
        }
      }
      return out;
    }

    For MD5 it falls back to a self-contained JavaScript implementation — the classic safeAdd / bitRotateLeft / md5cmn routine you’ve seen in a dozen libraries, working directly on an ArrayBuffer. No dependency, no network call, a couple hundred lines of code.

    Why MD5 is small enough to ship inline

    MD5 is a Merkle–Damgård construction. It pads the message to a multiple of 512 bits, then chews through it one 512-bit block at a time, updating four 32-bit state words across 64 operations grouped into 4 rounds. The whole thing is integer addition, bit rotation, and a handful of boolean mixing functions. That’s it — no S-boxes, no lookup tables, no big constants beyond a sine-derived table you can generate in one line.

    Because the algorithm is so plain, a correct MD5 fits in a few hundred bytes of minified JavaScript. SHA-512 by hand would be heavier and slower in JS, which is exactly why HashForge doesn’t reimplement the SHA family — the native crypto.subtle path is both faster and already vetted. You only drop to hand-rolled code for the one algorithm the platform won’t give you.

    The privacy detail that actually matters

    Files go through the same split. The page reads the file with file.arrayBuffer() and hands the raw bytes straight to either the native digest or the JS MD5:

    const buf  = await file.arrayBuffer();
    const hash = await crypto.subtle.digest('SHA-256', buf);

    That arrayBuffer() call is the whole privacy story. The bytes are read into memory inside your tab and never touch a network socket. Open the Network panel while you hash a 200 MB ISO and you’ll see zero requests. Pull your wifi and it keeps working, because there was never a server in the loop. Compare that to the typical “online hash calculator,” which POSTs your file to a backend and trusts you to believe their retention policy.

    Verify the output yourself in ten seconds

    Don’t take my word that the MD5 path is correct — a hash tool that quietly mis-pads is worse than no tool. Hash the empty string and abc, then check against the canonical test vectors:

    MD5("")        = d41d8cd98f00b204e9800998ecf8427e
    MD5("abc")     = 900150983cd24fb0d6963f7d28e17f72
    SHA-256("abc") = ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad

    Type abc into HashForge and you’ll get those exact bytes. I cross-checked them against md5sum and sha256sum on a Linux box before trusting the tool with anything real. Two-minute habit, and it catches a surprising number of broken implementations.

    HMAC is native-only, and that’s the right limit

    One place HashForge refuses to fill a gap: HMAC. It offers HMAC-SHA1/256/384/512 and stops there, because Web Crypto’s importKey plus sign('HMAC', ...) only accepts the SHA family. There’s no HMAC-MD5 button.

    That’s correct, not lazy. If you’re computing an HMAC you’re authenticating something, and HMAC-MD5 has no place in new code. The tool steers you to SHA-256 by simply not offering the broken option — the same stance the browser takes on raw MD5, applied one layer up.

    Which hash for which job

    A quick field guide, because this question comes up every week:

    • Matching a published checksum: use whatever the publisher used, MD5 or SHA-256. You’re catching accidental corruption, not an attacker, so a broken hash is fine here.
    • Content fingerprint, cache key, dedup: SHA-256 if you have a free choice; MD5 only to match an existing system.
    • Passwords: none of these. Use Argon2 or bcrypt. A raw SHA-256 of a password is still a leak waiting to happen.
    • Tokens and signatures: HMAC-SHA256 at minimum.

    If you want the actual math behind why MD5 fell and SHA-256 holds, Serious Cryptography by Jean-Philippe Aumasson is the clearest book I’ve found on collision attacks without drowning you in proofs. For the engineering side — where each primitive shows up in TLS, signatures, and storage — Real-World Cryptography by David Wong is the one I lend out most. Full disclosure: both are Amazon affiliate links.

    Why I keep it bookmarked

    The pitch is narrow and that’s the point. I need a hash, I can’t install a CLI on a locked-down work laptop, and I really don’t want to upload a file to a stranger’s server. HashForge does that one job: it computes all five digests at once, outputs hex or Base64, and runs on a text string or a dropped file. It pairs with the other browser-only tools I reach for — Base64Lab when I need to decode a token and PassForge when I need a random key — none of which phone home.

    Try it: HashForge. Hash something, open your Network tab, and watch nothing happen.

    Related reading: How a secure password generator actually works, catching leaked secrets in your git history, and why your online SQL formatter might be logging your data.


    Join https://t.me/alphasignal822 for free market intelligence.

  • How a Secure Password Generator Actually Works (and Why Math.random() Fails)

    Last week I was reviewing a small auth service and found this one-liner generating reset tokens:

    const token = Array.from({length: 16}, () =>
      CHARS[Math.floor(Math.random() * CHARS.length)]
    ).join('');

    It runs. It produces things like xK9$mLp2@nQ7vR4w. It also happens to be a real security bug. That exact pattern is the one I deliberately avoided when I built our free password generator — and the reason is worth 1,200 words, because almost every “roll your own” password snippet on the web gets it wrong in the same way.

    Here’s what’s broken about Math.random() for passwords, the fix, and the two gotchas that bite people who try to fix it themselves.

    Math.random() is predictable by design

    In V8 — the engine behind Chrome and Node — Math.random() has used an algorithm called xorshift128+ since version 4.9.40, shipped in late 2015. (Before that it was MWC1616, which was worse: only about 232 possible outputs.) xorshift128+ has 128 bits of internal state, a period of 2128 − 1, and it passes the TestU01 statistical suite. Statistically, the numbers look random.

    But “looks random” and “unpredictable” are different properties. xorshift128+ is a pseudo-random generator: every output is a deterministic function of that 128-bit state. And the state is recoverable. Feed enough consecutive outputs into a system of linear equations and you can solve for the internal state — there are public tools on GitHub that recover it from as few as 64 to 128 consecutive Math.random() calls. Once an attacker has the state, every future output is known. Every “random” password you generate after that point is predictable.

    For a UI animation or a Monte Carlo sim, who cares. For a password, an API key, or a session token, that’s the whole ballgame.

    crypto.getRandomValues() is the actual fix

    Browsers ship a cryptographically secure RNG (CSPRNG) through the Web Crypto API: crypto.getRandomValues(). It pulls from the operating system’s entropy pool (/dev/urandom on Linux, BCryptGenRandom on Windows) and is built so that observing past output tells you nothing about future output. There’s no recoverable 128-bit state to solve for.

    The function our generator uses is four lines:

    function secureRandom(max) {
      const arr = new Uint32Array(1);
      crypto.getRandomValues(arr);
      return arr[0] % max;
    }

    Read a fresh 32-bit unsigned integer from the CSPRNG, reduce it into the range you need, done. Swap Math.random() for this and the prediction attack above is gone. But notice that % max — that’s gotcha number one.

    Gotcha 1: modulo bias is real (but size matters)

    When you take a random integer modulo your alphabet size, the ranges usually don’t divide evenly, so some characters come up more often than others. I wanted to see how bad it actually is, so I generated 6.2 million random bytes and bucketed byte % 62 (a typical alphanumeric set):

    expected per character:  100,000
    lowest-frequency char:   ~96,900 hits
    highest-frequency char: ~121,400 hits
    ratio: 1.25

    That’s a 25% skew. It happens because 256 % 62 = 8, so byte values 0–7 each give one extra shot to the first eight characters. With a single byte feeding a 62- or 94-character set, the bias is large and easy to measure.

    The textbook fix is rejection sampling: throw away any byte in the biased tail and draw again. Rejecting values ≥ 248 dropped the skew to a 1.02 ratio in my test, at the cost of discarding about 3.1% of draws.

    But here’s the part the “always use rejection sampling” advice skips: the bias depends entirely on how big your random integer is relative to the alphabet. Our generator doesn’t read a single byte — it reads a full Uint32 (range 0 to about 4.29 billion). For a 94-character symbol set, Uint32 % 94 makes the favored characters more likely by roughly 1 part in 45 million — a bias of 0.0000022%. For a password, that’s noise far below anything that matters. So I skipped rejection sampling on purpose and kept the code simple, because a 32-bit draw already makes the bias irrelevant. If I were minting cryptographic keys I’d add the rejection step; for human passwords, a wide draw is enough.

    Gotcha 2: the 64KB quota wall

    The second surprise showed up while I was running that bias test. My first attempt asked getRandomValues() to fill one big buffer:

    crypto.getRandomValues(new Uint8Array(620000));
    // QuotaExceededError: The requested length exceeds 65,536 bytes

    getRandomValues() refuses any request over 65,536 bytes (64 KB) in a single call. It’s in the spec and every browser enforces it. If you’re generating one 16-character password you’ll never hit it, but the moment you batch-generate or fill a large buffer, you have to chunk:

    function fillSecure(buf) {
      for (let i = 0; i < buf.length; i += 65536) {
        crypto.getRandomValues(buf.subarray(i, i + 65536));
      }
    }

    Undocumented in most tutorials, and a hard failure rather than a silent one — which is at least honest of it.

    Why browser-only matters here

    Our generator runs entirely in your browser. The password is built on your machine from your OS entropy and never touches a network. That’s not a tagline — it’s the only design that makes sense for a secret. A “password generator” that does the work server-side is a service that has seen your password in plaintext, which is the same trust problem I wrote about with online SQL formatters quietly logging queries. Open the dev tools, watch the Network tab while you click generate, and you’ll see exactly zero requests.

    You can try it here: the orthogonal.info password generator. Slide to 16+ characters, toggle the symbol set, copy, done.

    One layer is never enough

    A strong, truly-random password fixes the “guessable” problem. It does nothing about phishing, reused credentials, or a leaked database. After the LastPass mess I moved my own vault into KeePassXC and put a hardware key on every account that supports one. A YubiKey 5 NFC turns a stolen password into a useless string, because login also needs the physical key in my pocket. Full disclosure: that’s an affiliate link — but it’s also literally what’s on my keyring. Generate unique passwords, store them in a real manager, and gate the important accounts with hardware 2FA. Three cheap layers beat one strong one.

    The lesson I keep relearning: in security, the code that “works” and the code that’s correct are often the same length and completely different. Math.random() works. crypto.getRandomValues() is correct.


    Want signal instead of noise on markets and tech? Join https://t.me/alphasignal822 for free market intelligence.

  • How EXIF GPS Data Is Stored in a JPEG — A Byte-Level Teardown

    Last week I wanted to prove a point to a friend who insisted his vacation photos were “fine to post.” So I opened one of his JPEGs in a hex editor, scrolled about 40 bytes in, and read his hotel’s GPS coordinates straight off the screen — no tools, no library, just the raw bytes. That’s the thing nobody tells you about EXIF: it isn’t encrypted, hashed, or hidden. It’s sitting near the front of almost every photo your phone takes, in a format you can decode by hand once you know the layout. This post is the byte-level teardown, and at the end I’ll show why PixelStrip removes that data without touching a single pixel.

    A JPEG is just a stream of markers

    Every JPEG starts with two bytes: FF D8, the Start Of Image marker. After that the file is a sequence of segments, and every segment begins with FF followed by a marker byte. The one we care about is FF E1 — that’s APP1, where EXIF lives.

    Here’s the front of a real photo, annotated:

    FF D8              SOI (start of image)
    FF E1              APP1 marker  <- EXIF starts here
    00 84              segment length = 0x0084 = 132 bytes (big-endian, always)
    45 78 69 66 00 00  "Exif\0\0"
    49 49              "II" = Intel / little-endian byte order
    2A 00              42, the TIFF magic number
    08 00 00 00        offset to first IFD = 8

    Two details trip people up here. First, that segment-length field is always big-endian, because it’s part of the JPEG container, not the EXIF payload. Second, the byte order flag (II for little-endian, MM for big-endian) only applies to everything after the Exif\0\0 header. From that point on, every multi-byte number flips based on those two bytes.

    The TIFF header and IFD entries

    What follows Exif\0\0 is a tiny TIFF file. All internal offsets are measured from the start of the byte-order mark — not the start of the file. Forget that and every pointer you read lands in the wrong place. I’ve debugged this exact off-by-six error more times than I’d like to admit.

    The 4-byte offset (here 08 00 00 00 = 8) points to the first Image File Directory, or IFD0. An IFD is dead simple:

    • 2 bytes: how many entries follow
    • 12 bytes per entry
    • 4 bytes at the end: offset to the next IFD (0 means stop)

    Each 12-byte entry breaks down as: a 2-byte tag ID, a 2-byte data type, a 4-byte value count, and a 4-byte field that holds either the value itself (if it fits in 4 bytes) or an offset to where the value actually lives. GPS coordinates don’t fit in 4 bytes, so they’re always stored by offset.

    The tag we hunt for in IFD0 is 0x8825 — the GPS IFD pointer. Its value is an offset to a separate sub-directory holding the location tags. Jump there and you find the payload.

    Decoding latitude by hand

    The GPS sub-IFD uses a handful of tags. The important ones:

    • 0x0001 GPSLatitudeRef — ASCII “N” or “S”
    • 0x0002 GPSLatitude — three RATIONAL values: degrees, minutes, seconds
    • 0x0003 GPSLongitudeRef — “E” or “W”
    • 0x0004 GPSLongitude — three more RATIONALs

    A RATIONAL is two 4-byte unsigned integers: a numerator followed by a denominator. So latitude is three of them — 24 bytes total. Here’s the actual block from that photo, little-endian:

    25 00 00 00  01 00 00 00   ->  37 / 1   = 37 degrees
    2E 00 00 00  01 00 00 00   ->  46 / 1   = 46 minutes
    C4 0B 00 00  64 00 00 00   ->  3012 / 100 = 30.12 seconds

    Convert degrees-minutes-seconds to decimal: 37 + 46/60 + 30.12/3600 = 37.7750° N. Pair that with the longitude block and you have a point accurate to roughly three meters. That’s precise enough to land on a specific building. My friend went quiet after I read his back.

    A 40-line parser in the browser

    You don’t need a library to do this. Browser DataView reads typed values out of an ArrayBuffer with explicit endianness, which is exactly what EXIF needs. Here’s the core of finding the APP1 segment and its byte order:

    function findExif(view) {
      let offset = 2; // skip the FF D8 SOI
      while (offset < view.byteLength) {
        if (view.getUint8(offset) !== 0xFF) break;
        const marker = view.getUint8(offset + 1);
        const size = view.getUint16(offset + 2); // big-endian on purpose
        if (marker === 0xE1) {
          const tiff = offset + 10;            // skip marker, length, "Exif\0\0"
          const le = view.getUint16(tiff) === 0x4949; // "II"
          return { tiff, littleEndian: le, app1Start: offset, size };
        }
        offset += 2 + size; // jump to the next segment
      }
      return null;
    }

    Note that getUint16 defaults to big-endian, which is correct for the JPEG segment length. Once you have the littleEndian flag, you pass it to every read inside the TIFF block. Reading a RATIONAL is two reads and a divide:

    function readRational(view, pos, le) {
      return view.getUint32(pos, le) / view.getUint32(pos + 4, le);
    }

    That’s the whole trick. Walk the IFD entries, find tag 0x8825, jump to the GPS sub-IFD, pull the latitude and longitude rationals, and apply the N/S/E/W sign. About 40 lines, no dependencies, runs offline.

    Two ways to strip it — and why they differ

    Now the part that actually matters. There are two ways to remove this metadata, and they are not equal.

    Re-encode the whole image. Draw the photo onto a <canvas> and call toBlob(). The new file is built from raw pixels, so it carries no EXIF at all. Clean — but every pixel gets recompressed, which means slight quality loss and a completely different byte layout. That’s the approach my QuickShrink compressor takes, and I wrote up the mechanics in how browser image compression actually works. Good when you also want a smaller file.

    Splice out the segment. If all you want is to delete the metadata and keep the image untouched, you cut the APP1 segment out of the byte stream and leave everything else identical:

    const out = new Uint8Array(bytes.byteLength - (2 + size));
    out.set(bytes.subarray(0, app1Start));
    out.set(bytes.subarray(app1Start + 2 + size), app1Start);

    The pixels stay bit-for-bit identical. No recompression, no quality loss, no visible change — just the location data gone. That’s what PixelStrip does.

    One gotcha worth knowing: a single JPEG can carry more than one metadata block. EXIF lives in APP1, but XMP often rides in a second APP1, Photoshop data sits in APP13, and the EXIF thumbnail in IFD1 can hold its own copy of the GPS tags. A parser that removes only the first APP1 it sees will miss the rest. A real stripper loops over every APPn segment, which is the unglamorous part most “remove EXIF” snippets skip.

    What to actually do with this

    If you only remember one rule: platforms are inconsistent. Twitter and iMessage scrub metadata on upload; Discord, email attachments, Slack file shares, and most forums pass it through untouched. Assume the worst and clean photos before they leave your machine.

    For a one-click clean that keeps your image quality intact, drop the photo into PixelStrip — it runs entirely in your browser, so the file never uploads anywhere, and it surgically removes EXIF, GPS, and XMP without recompressing. If you want the privacy reasoning rather than the byte layout, I covered that in how to strip EXIF data before sharing. The rest of the browser tools follow the same no-upload rule.

    If you want to go deeper than a hex editor, file-format forensics books cover exactly this kind of byte-level metadata extraction across image, document, and filesystem formats — a solid digital forensics reference is what I keep on the shelf for the weird edge cases (full disclosure: Amazon affiliate link). It’s the difference between guessing at an offset and knowing why it’s there.


    Join https://t.me/alphasignal822 for free market intelligence.

  • How Browser Image Compression Actually Works (Canvas API, toBlob, and Why Your JPEGs Shrink)

    Last week a teammate asked me why our little browser tool, QuickShrink, could take a 4.2 MB phone photo and hand back a 380 KB file that looked identical — all without uploading a single byte to a server. He assumed there was some clever backend doing the heavy lifting. There isn’t. It’s about 40 lines of JavaScript and a browser API that has shipped in every major engine since roughly 2013. I want to walk through exactly what happens between the file picker and the download link, because once you understand it, you stop trusting upload-based compressors that ship your private photos to someone else’s box.

    The whole pipeline is three steps

    Browser image compression with the Canvas API comes down to: decode the image into pixels, paint those pixels onto a canvas, then re-encode the canvas at a chosen quality. That’s it. Here’s the core of what QuickShrink runs, stripped to the essentials:

    async function compress(file, quality = 0.8) {
      // 1. Decode: turn the file bytes into a bitmap
      const bitmap = await createImageBitmap(file);
    
      // 2. Paint: draw the bitmap onto a canvas
      const canvas = document.createElement('canvas');
      canvas.width = bitmap.width;
      canvas.height = bitmap.height;
      const ctx = canvas.getContext('2d');
      ctx.drawImage(bitmap, 0, 0);
    
      // 3. Re-encode: read pixels back out as a compressed blob
      return new Promise((resolve) => {
        canvas.toBlob(resolve, 'image/jpeg', quality);
      });
    }

    The magic is in step three. When you call canvas.toBlob(callback, 'image/jpeg', 0.8), the browser runs its native JPEG encoder over the raw RGBA pixels sitting in the canvas buffer. That 0.8 is the quality factor, a number between 0 and 1, and it maps to the same quantization-table scaling that libjpeg uses under the hood. Lower the number, the encoder throws away more high-frequency detail, and the file shrinks.

    Why the file gets smaller without looking worse

    JPEG compression is lossy and it exploits a fact about human vision: we’re bad at noticing small changes in color and fine detail, but good at noticing changes in brightness and edges. The encoder splits the image into 8×8 pixel blocks, runs a discrete cosine transform on each, and then quantizes the result — rounding off the coefficients that represent detail your eye won’t miss.

    The quality factor controls how aggressive that rounding is. At 0.92 you’re barely touching anything. At 0.8 you’ve cut the file roughly in half and almost nobody can tell in a blind test. Drop to 0.6 and you’ll start seeing ringing around hard edges — text on a screenshot is where it shows up first. I settled on 0.8 as the default after eyeballing a few hundred photos. It’s the knee of the curve where you get most of the size savings before quality visibly drops.

    Real numbers from a real photo set

    I ran a batch of 20 photos straight off a Pixel 8 — landscapes, indoor shots, a couple of screenshots — through the canvas pipeline at different quality settings. Average original size was 3.8 MB per file. Here’s what came out:

    • quality 0.92 — avg 1.9 MB, about 50% reduction, visually lossless
    • quality 0.80 — avg 720 KB, about 81% reduction, no visible loss on normal viewing
    • quality 0.60 — avg 410 KB, about 89% reduction, slight softening on text edges
    • quality 0.40 — avg 280 KB, about 93% reduction, obvious artifacts

    The reason phone photos compress this well is that they start out barely compressed. Camera apps save at quality 0.95 or higher to avoid complaints, and they bake in fat EXIF blocks with GPS coordinates, lens data, and a full-size thumbnail. Re-encoding at 0.8 and dropping the metadata is where most of the savings come from. (If the metadata part interests you, I wrote a separate piece on how EXIF leaks your home address and how to strip it.)

    The gotcha nobody warns you about: createImageBitmap vs Image

    The old way to decode an image was to create an <img> element, set its src to an object URL, and wait for the onload event. It works, but it decodes on the main thread and blocks your UI while it runs. On a big panorama, that’s a visible freeze.

    // Old way - blocks the main thread
    const img = new Image();
    img.onload = () => ctx.drawImage(img, 0, 0);
    img.src = URL.createObjectURL(file);

    createImageBitmap() is the better path. It decodes off the main thread, returns a promise, and gives you an ImageBitmap that draws to canvas faster because it’s already in a GPU-friendly format. On the 20-photo batch above, switching from the Image approach to createImageBitmap cut total processing time from 4.1 seconds to 1.6 seconds. If you build anything that compresses more than one file, use it.

    One real gotcha: createImageBitmap ignores EXIF orientation by default. Photos shot in portrait can come out sideways. You fix it by passing { imageOrientation: 'from-image' } as the second argument, which most engines now honor:

    const bitmap = await createImageBitmap(file, { imageOrientation: 'from-image' });

    WebP is where the real wins are

    JPEG is the safe default, but if you don’t need to email the file to someone on a 2014 device, WebP beats it badly. Same canvas, same code, you just change the MIME type:

    canvas.toBlob(resolve, 'image/webp', 0.8);

    On my test set, WebP at quality 0.8 came out to an average of 480 KB versus JPEG’s 720 KB at the same setting — another third smaller for the same perceived quality. Every browser shipped in the last six years decodes WebP, so the compatibility argument is mostly dead unless you’re targeting ancient hardware. The one place I still reach for JPEG is when the recipient is going to drag the file into some old desktop app that chokes on WebP.

    Why “browser-only” is the part that matters

    Here’s the bit I care about most. Because every step — decode, paint, re-encode — runs inside createImageBitmap and canvas.toBlob, the image never leaves the tab. There’s no fetch, no upload, no server log with your file sitting in it. You can literally open the network tab in DevTools, compress a photo, and watch zero requests fire. Pull your ethernet cable and it still works.

    That’s not true of most “free online image compressor” sites. They POST your file to a backend, compress it there, and hand back a URL. Which means a copy of your photo — with its original GPS metadata if they don’t strip it — lives on a machine you don’t control, for however long their retention policy says, or doesn’t say. For a meme, who cares. For a photo of a document, a whiteboard with company internals, or a picture taken inside your house, that’s a real leak. I’ve gotten paranoid enough about this that I treat every upload-based dev tool as a potential logging endpoint, which is the same reason I wrote about why you should stop pasting sensitive data into online dev tools.

    Try it, or build your own

    If you just want the result, QuickShrink is the tool — drag a photo in, pick a quality, download. No account, no upload, no tracking. If you want to build your own, the code above is the whole idea; wrap it in a drag-drop handler and a quality slider and you’re done in an afternoon.

    The hardware angle matters too. Canvas re-encoding is CPU-bound and single-image-fast, but if you batch-process hundreds of RAW or high-res files, a machine with more cores and fast storage makes the difference between seconds and minutes. I do my bulk photo work on an SSD-backed box, and a good portable drive like the Samsung T7 Shield portable SSD is what I use to shuttle large photo libraries between machines without waiting on a slow USB stick. Full disclosure: that’s an Amazon affiliate link — it’s the drive I actually use.

    The takeaway: browser image compression isn’t magic and it isn’t a backend. It’s a 13-year-old canvas API, a quality number between 0 and 1, and the choice to keep your pixels on your own machine. Once you know how the pipeline works, the upload-based tools start looking like a strictly worse deal.

    Related reading: how EXIF metadata broadcasts your home address, a byte-level teardown of EXIF GPS data in a JPEG, and why browser-only tools beat upload-based ones.


    Join https://t.me/alphasignal822 for free market intelligence.

  • Your Online SQL Formatter Might Be Logging Your Database Password

    Last month I watched a contractor paste a full Kubernetes secret manifest — base64 blobs and all — into the first “free YAML validator” that came up on Google. He just wanted to check indentation. What he actually did was POST a production database password to a server he’d never heard of, run by people he’ll never meet, with a privacy policy he didn’t read.

    That’s the part of online dev tools nobody talks about. A SQL formatter, a YAML validator, a JSON beautifier — they feel disposable, like a calculator. But a huge number of them send whatever you paste to a backend for processing. If that paste contains a connection string, an API key, or a customer record, you just leaked it. No breach required. You handed it over.

    Why “format my SQL” is a data exfiltration path

    Here’s the mechanic. Server-side tools work like this: your text goes into a textarea, JavaScript fires an HTTP request to /api/format, the server runs the actual formatting, and the result comes back. Simple to build, which is exactly why so many sites do it that way.

    The problem is what travels in that request body. I tested a handful of popular online formatters with my browser’s Network tab open. Several of them sent the entire input payload to their own domain. One sent it to a third-party API. The query I pasted was harmless test data, but the request was real — my text left my machine.

    Now picture the realistic version. You’re debugging a failing migration at 11pm. You copy the offending query straight out of your ORM logs to “just clean it up.” That query has a hardcoded credential a teammate left in six months ago. You paste, you format, you move on. The credential is now in someone’s request logs, maybe their analytics, maybe an LLM training pipeline if the tool resells data. You will never know.

    This isn’t paranoia. It’s the same threat model that makes pasting code into random pastebins a fireable offense at most security-conscious shops. We just don’t apply it to “format” tools because they feel too small to matter.

    The browser-only alternative

    The fix is structural, not procedural. Don’t rely on remembering to scrub secrets first — use tools that physically can’t send your data anywhere, because all the work happens in your tab.

    That’s the whole reason I built our formatters as single-file, client-side apps. When you use the SQL Formatter, the YAML Validator, or the Diff Checker, the parsing and formatting runs in JavaScript on your device. There is no /api/format endpoint. There’s no backend at all. The text in your textarea never crosses the network, because there’s nowhere for it to go.

    For a diff tool this matters even more. People routinely paste two versions of a config file — say, a working .env and a broken one — to spot what changed. Those files are nothing but secrets. A browser-only diff means you can compare two API keys character by character without either one leaving your laptop.

    How to actually verify a tool is client-side

    Don’t take any tool’s word for it, including mine. Verifying is a two-minute job and every developer should know how.

    1. Watch the Network tab. Open DevTools (F12), go to the Network panel, clear it, then paste your text and hit format. If you see a new XHR or fetch request fire with your input in the payload, the tool is server-side. If nothing happens on the network, the work is local.

    // What a server-side formatter looks like in Network tab:
    POST /api/format-sql
    Request Payload: { "query": "SELECT * FROM users WHERE token='sk_live_...'" }
    
    // What a client-side tool looks like:
    // (nothing — no request fires when you click format)

    2. Kill your connection. The bluntest test there is. Load the page, then turn off Wi-Fi or drop into airplane mode. If the tool still formats your text, it’s running entirely in the browser. If it spins or errors, it needed a server. I do this with any tool before I trust it with anything sensitive.

    3. Check for a service worker. Truly offline-capable tools register a service worker so they work with no connection at all. In DevTools, look under Application → Service Workers. Its presence is a strong signal the developer designed for offline-first, which usually means client-side processing too.

    Where this fits in a real workflow

    A few concrete cases where I reach for browser-only tools specifically because of the data:

    • Reviewing a teammate’s config PR. Diffing two Helm values files that contain registry credentials — done locally, nothing logged anywhere.
    • Cleaning up a query from prod logs. Format it to read it, without shipping whatever sensitive WHERE clause it carries to a stranger’s server.
    • Validating a CI secrets file. Checking that a GitHub Actions YAML parses before you commit, without exposing the encrypted values to a validation API.
    • On a locked-down network. Some client environments block external dev-tool domains entirely. Offline-capable tools just keep working.

    The broader point: treat every “paste your text here” box as a potential outbound network call until you’ve proven otherwise. Most of the time it’s fine. The one time it isn’t, it’s a leaked credential you can’t un-leak.

    Defense in depth still applies

    Browser-only tools remove one exfiltration path, but they don’t make you immune to the dumber failure modes — like a secret sitting in your shell history or git log in the first place. If you handle credentials daily, a hardware key cuts a whole class of phishing and credential-theft risk off at the knees. I use a YubiKey 5 Series for exactly this (full disclosure: affiliate link, but it’s the same key I carry on my own keyring). Pair that with the pre-commit secret scanning setup I wrote about earlier, and you’ve closed the two most common ways credentials walk out the door.

    Start with the small habit, though. Next time you reach for an online formatter or diff tool, open the Network tab first. If your text leaves the browser, find one that keeps it home.


    Join https://t.me/alphasignal822 for free market intelligence.

  • I Switched to KeePassXC After LastPass Got Breached — Here’s My Setup

    Last December I got the email every LastPass user dreaded: my vault backup was part of the breach. The master password was strong, but knowing encrypted blobs of my entire digital life were sitting on some attacker’s disk made me physically uncomfortable. I spent a weekend migrating everything to KeePassXC, and six months later I’m not going back.

    Why Local-First Matters for Passwords

    The LastPass breach exposed a fundamental problem with cloud password managers: your encrypted vault is only as safe as the infrastructure storing it. LastPass used 100,100 PBKDF2 iterations for newer accounts — older accounts had as few as 5,000. That’s crackable with a decent GPU rig.

    KeePassXC stores everything in a single .kdbx file on your machine. No servers, no breach notifications, no third-party trust. The file uses AES-256 or ChaCha20 encryption with Argon2d key derivation — you control the iteration count, memory usage, and parallelism. I run mine at 64MB memory / 10 iterations / 4 threads, which takes about 1 second to unlock on my laptop but would cost serious money to brute-force.

    The Setup That Actually Works Day-to-Day

    The knock against local password managers has always been “but what about sync?” Fair point. Here’s how I solved it without trusting anyone else with my vault:

    # My .kdbx lives in a Syncthing folder shared between:
    # - Work laptop (Linux)
    # - Personal desktop (Windows)
    # - Phone (via Syncthing + KeePassDX on Android)
    
    ~/.local/share/syncthing/vault/
    ├── passwords.kdbx
    └── passwords.kdbx.key   # key file (separate from master password)

    Syncthing handles peer-to-peer sync over my local network and WireGuard tunnel when I’m away. The vault never touches anyone else’s servers. Conflict resolution? KeePassXC handles .kdbx merge conflicts natively since version 2.7 — it’ll prompt you to merge changes if two devices edited simultaneously.

    Hardware Key as Second Factor

    This is where it gets good. KeePassXC supports YubiKey challenge-response as an additional key factor. My unlock requires:

    1. Master password (memorized, 6 random words)
    2. Key file (stored only on my devices, never synced to cloud)
    3. YubiKey HMAC-SHA1 challenge-response (slot 2)

    Setting this up:

    # Program YubiKey slot 2 for HMAC-SHA1 challenge-response
    ykman otp chalresp --generate 2
    
    # In KeePassXC: Database → Database Security → Add Additional Protection
    # Select "Challenge-Response" → pick your YubiKey

    An attacker who steals my .kdbx file needs all three factors. Even if they get my laptop with the key file, they still need the physical YubiKey and the password. I keep a backup YubiKey 5 NFC in my safe — $50 for peace of mind that I won’t lock myself out.

    Browser Integration Without the Extension Tax

    KeePassXC’s browser integration works through a native messaging host — no network calls, no cloud sync of browser state. I tested fill speed across three setups:

    Setup Fill latency Memory overhead
    1Password (extension) 180-400ms ~85MB
    Bitwarden (extension) 120-300ms ~60MB
    KeePassXC (native messaging) 30-80ms ~12MB

    KeePassXC fills faster because it communicates through a Unix socket to the running desktop app — no HTTP round-trips, no extension JavaScript parsing the DOM. The browser add-on is just a thin UI layer.

    # Enable browser integration (Linux)
    # KeePassXC → Tools → Settings → Browser Integration
    # Check "Enable browser integration"
    # Check "Firefox" and/or "Chromium"
    # It writes the native messaging manifest automatically to:
    # ~/.mozilla/native-messaging-hosts/org.keepassxc.keepassxc_browser.json

    Honest Comparison: KeePassXC vs The Cloud Options

    vs Bitwarden — Bitwarden is the closest competitor and genuinely good. It’s open source, self-hostable (Vaultwarden), and the free tier is generous. I’d recommend it to anyone who doesn’t want to manage sync themselves. The tradeoff: you’re trusting their server-side encryption implementation, or running your own server (which means patching, backups, certificates). KeePassXC has no server component to maintain or secure.

    vs 1Password — Polished UI, great team features, expensive ($36/year individual, $60/year family). The “Secret Key” system is clever — it means 1Password can’t decrypt your vault even with a breach. But it’s closed source. You’re trusting their claims. For a solo developer who reads source code, that’s a non-starter for me.

    vs LastPass — Just don’t. After the 2022 breach, the 2023 follow-up showing employee vaults were compromised, and the consistently slow response times… there’s no reason to trust them with anything sensitive.

    The One Thing That Annoys Me

    Mobile is worse than cloud managers. Full stop. KeePassDX on Android works, but auto-fill is flaky on some apps, and you need to manually trigger sync if you added a password on desktop 30 seconds ago. I’ve accepted this tradeoff — I add most passwords on desktop anyway, and the security model is worth the occasional inconvenience on mobile.

    Migration Script

    If you’re coming from LastPass, Bitwarden, or 1Password, KeePassXC imports CSV exports directly. Here’s my cleanup script that runs after import to organize entries:

    #!/usr/bin/env python3
    """Post-import cleanup for KeePassXC CSV import.
    Removes duplicate entries and normalizes URLs."""
    import csv, sys
    from urllib.parse import urlparse
    
    def normalize_url(url):
        parsed = urlparse(url)
        return f"{parsed.scheme}://{parsed.netloc}".lower()
    
    seen = {}
    with open(sys.argv[1]) as f:
        reader = csv.DictReader(f)
        for row in reader:
            key = (row['Username'], normalize_url(row.get('URL','')))
            if key not in seen or len(row.get('Password','')) > len(seen[key].get('Password','')):
                seen[key] = row
    
    print(f"Deduplicated: {len(seen)} unique entries")

    My Recommendation

    If you’re a developer comfortable with file management and want zero cloud trust for your passwords: KeePassXC + Syncthing + YubiKey is the strongest setup I’ve found. Total cost: $50 for the YubiKey (plus a backup), everything else is free and open source.

    If you want something that “just works” across devices without any setup: Bitwarden free tier. No shame in that — it’s genuinely good software.

    For more tools and privacy-focused workflows, check out our security guides and tools section.

    Related reading: how a secure password generator actually works and the pre-commit setup that stopped 14 leaked secrets in my git history.


    Full disclosure: Amazon links above are affiliate links (tag=orthogonalinf-20). I bought my YubiKeys at full price before writing this.

    📡 Join https://t.me/alphasignal822 for free market intelligence — we cover fintech security and trading tools daily.

  • Your Photos Are Broadcasting Your Home Address — How EXIF Metadata Works and How to Strip It

    Last month I helped a friend figure out why a stalker knew her daily routine. The answer was in her Instagram stories — not the content, but the metadata baked into every JPEG she posted. GPS coordinates, timestamps accurate to the second, even her phone model. Instagram strips EXIF on upload, but she’d been sharing originals in a group chat first.

    🔒 Strip EXIF the easy way — without uploading your photos

    Re-saving an image through a client-side tool removes embedded GPS and EXIF metadata automatically, because the file is rebuilt fresh in your browser. QuickShrink does exactly that: it compresses and re-encodes your images entirely in your browser — nothing is ever uploaded to a server, so your location data never leaves your device.

    Clean & compress your photos free →

    Most developers know EXIF exists. Fewer know exactly what’s in there, how to parse it programmatically, or how to strip it without degrading image quality. I spent a weekend building a browser-based EXIF stripper that never uploads your files, and learned more about the JPEG binary format than I expected.

    What EXIF Actually Contains (It’s Worse Than You Think)

    EXIF (Exchangeable Image File Format) lives in the APP1 marker segment of JPEG files, right after the SOI (Start of Image) marker at bytes 0xFFD8. The structure follows TIFF IFD (Image File Directory) format — a linked list of tagged key-value pairs.

    Here’s what a typical iPhone photo contains:

    GPS Latitude: 37.7749 N
    GPS Longitude: 122.4194 W
    GPS Altitude: 12.3m above sea level
    DateTime Original: 2026:05:20 14:32:07
    Make: Apple
    Model: iPhone 15 Pro Max
    Lens: iPhone 15 Pro Max back camera 6.765mm f/1.78
    Software: 18.4.1
    Orientation: Rotate 90 CW
    Focal Length: 6.765mm (equiv 24mm)
    Exposure: 1/120s at f/1.78, ISO 50
    Unique Image ID: 4A3B2C1D-...

    That’s 40+ fields in a single photo. The GPS data alone is accurate to about 3 meters with modern phones. Post enough photos from your apartment and anyone with exiftool can pinpoint your building.

    The Binary Structure: Parsing EXIF in JavaScript

    If you want to strip EXIF without re-encoding (which would lose quality), you need to understand the byte layout. A JPEG with EXIF looks like this:

    FF D8          - SOI marker (Start of Image)
    FF E1 [len]   - APP1 marker (EXIF data lives here)
      45 78 69 66 00 00  - "Exif\0\0" header
      [TIFF header + IFD entries + GPS sub-IFD]
    FF E0 [len]   - APP0 marker (JFIF, optional)
    FF DB [len]   - DQT (quantization tables)
    FF C0 [len]   - SOF (frame header)
    ...            - actual image data
    FF D9          - EOI marker

    The key insight: you can remove the entire APP1 segment without touching image pixels. The compressed image data starts at SOF and is completely independent of the metadata. Here’s the core logic I use:

    function stripExif(arrayBuffer) {
      const view = new DataView(arrayBuffer);
      if (view.getUint16(0) !== 0xFFD8) return arrayBuffer;
    
      const segments = [];
      let offset = 2;
    
      while (offset < view.byteLength) {
        const marker = view.getUint16(offset);
        if (marker === 0xFFDA) {
          segments.push(arrayBuffer.slice(offset));
          break;
        }
        const segLen = view.getUint16(offset + 2);
        if (marker !== 0xFFE1 && marker !== 0xFFED) {
          segments.push(arrayBuffer.slice(offset, offset + 2 + segLen));
        }
        offset += 2 + segLen;
      }
    
      const soi = new Uint8Array([0xFF, 0xD8]);
      const parts = [soi, ...segments.map(s => new Uint8Array(s))];
      const result = new Uint8Array(parts.reduce((a, p) => a + p.length, 0));
      let pos = 0;
      for (const part of parts) {
        result.set(part, pos);
        pos += part.length;
      }
      return result.buffer;
    }

    This approach is lossless — zero re-encoding, zero quality loss. The output file is typically 5-50KB smaller than the input because you’re removing the metadata block entirely.

    Why “Browser-Only” Matters for This

    Think about the irony: you want to strip location data from your photos for privacy… so you upload them to a random website? That site now has your original files, complete with GPS coordinates, before stripping anything.

    I built the orthogonal.info image tool to process everything client-side using the Canvas API and ArrayBuffer manipulation. Your files never leave your browser tab. Verify by opening DevTools Network tab — zero upload requests during processing.

    const file = input.files[0];
    const buffer = await file.arrayBuffer();
    const stripped = stripExif(buffer);
    const blob = new Blob([stripped], { type: 'image/jpeg' });
    const url = URL.createObjectURL(blob);

    What About PNG and WebP?

    PNG stores metadata differently — in tEXt, iTXt, and eXIf chunks rather than APP1 markers. The chunk-based format makes it straightforward to filter: read each chunk’s 4-byte type identifier, skip the ones you don’t want, concatenate the rest.

    WebP uses RIFF container format with an EXIF chunk. Same principle: parse chunks, drop the EXIF one, rebuild.

    Tools I Actually Use

    For batch processing on my homelab, I use exiftool:

    # Strip ALL metadata from every JPEG in a directory
    exiftool -all= -overwrite_original *.jpg
    
    # Keep orientation (so photos display correctly) but strip everything else
    exiftool -all= -tagsfromfile @ -Orientation -overwrite_original *.jpg

    That second command is important — if you strip the Orientation tag, portrait photos will display sideways in some viewers. Common gotcha.

    For quick one-off checks before sharing, I use our browser-based tool — compress and strip in one step, no install needed. For developers building apps that handle user uploads, the piexifjs library (3KB gzipped) handles read/write/strip operations well.

    If you’re processing images on a server, a Raspberry Pi 5 running an exiftool batch script works great as a dedicated metadata sanitizer on your network — keeps processing local and costs about $80 total with a case and SD card.

    Platforms That Strip vs. Don’t

    I tested 12 platforms in May 2026:

    Strip EXIF on upload: Instagram, Twitter/X, Facebook, LinkedIn, Discord, iMessage

    Preserve EXIF (danger zone): Email attachments, Signal (original quality), Telegram (as file), Google Drive, Dropbox shared links, most forum software

    Signal strips EXIF when you send as a compressed photo, but preserves everything when you tap “original quality.” Most people don’t realize the distinction. Telegram behaves the same way: compressed = stripped, sent as file = full metadata intact.

    The Real Risk Model

    For most people, the threat isn’t nation-state actors. It’s:

    • Selling items online with photos taken at home (Craigslist, Facebook Marketplace)
    • Sharing “original quality” photos in group chats with acquaintances
    • Uploading images to forums, bug trackers, or documentation sites
    • Dating app photos with location data if the platform doesn’t strip

    A privacy screen protector stops shoulder-surfers, but EXIF metadata is the silent leak most people never think about. Strip it before sharing. Every time.

    If you handle images in any application — whether it’s a side project or production — add EXIF stripping to your upload pipeline. It’s 20 lines of code and it protects your users from themselves.

    Related: Developer Tools Guide | DevSecOps in Practice

    Join Alpha Signal for free market intelligence — daily signals, no spam.

Also by us: StartCaaS — AI Company OS · Hype2You — AI Tech Trends