How One Regex Took Down Cloudflare: Catastrophic Backtracking, Tested in Your Browser

Written by

in

A single line of validation code once took down half of Cloudflare. On July 2, 2019, a regular expression pushed to their WAF spiked CPU to 100% across their global network and knocked a chunk of the internet offline for about 27 minutes. The regex looked harmless. It contained a pattern that backtracks exponentially, and one crafted request was enough to melt a core.

I keep RegexLab open in a pinned tab specifically to catch this class of bug before it ships. It’s a browser-only regex tester, which matters here for a reason most people miss: when you’re testing a pattern that can hang for 13 seconds, you really don’t want that pattern running on someone else’s server. In RegexLab it runs on your machine, in the same V8 engine your Node backend uses, so the timing you see is the timing you’ll get in production.

What catastrophic backtracking actually is

Most regex engines (JavaScript, Python’s re, Java, PCRE) use backtracking. When a pattern can match the same input more than one way, the engine tries one path, and if that fails, it walks back and tries another. Usually that’s fine. The problem starts when the number of possible paths grows exponentially with input length.

The textbook example is a quantifier inside a quantifier:

/^(a+)+$/

Feed it a string of a characters followed by one b. The b guarantees the match fails at the end, but before the engine gives up, it tries every way to split those as between the inner a+ and the outer +. That’s 2^n splits. Each extra character doubles the work.

I benchmarked it on my machine (Node 24, plain V8 — same engine Chrome runs) with /^(a+)+$/ against n copies of “a” plus a trailing “b”:

n=15   0.39 ms
n=20   10.6 ms
n=22   42.3 ms
n=24   169 ms
n=26   585 ms
n=28   2,516 ms

Read that again. Going from 26 to 28 characters — two bytes — took the match from half a second to two and a half seconds. At n=32 you’re looking at ~40 seconds for one call. An attacker doesn’t need a botnet. They need one text field and a 32-character string.

The version that bites real apps

Nobody writes /^(a+)+$/ on purpose. The dangerous ones look reasonable. Here’s a pattern shaped like a thousand email and username validators I’ve seen in the wild:

/^([a-zA-Z0-9]+)*@/

Looks like it’s checking for alphanumeric characters before an @. It also has a + nested inside a *, which is the same exponential trap wearing a nicer suit. I fed it a long run of letters with no @ so the match fails:

n=20   13 ms
n=25   434 ms
n=28   3,303 ms
n=30   13,225 ms

Thirty characters. Thirteen seconds. If that regex sits on a login or signup endpoint, one request ties up a worker for 13 seconds. Send twenty of them and your event loop is done. This is a denial-of-service that ships as “input validation.”

Spotting it before it ships

The tell is any place where two quantifiers can fight over the same characters. Watch for these shapes:

  • (x+)+ — nested quantifiers, the classic
  • (x*)* — same idea
  • (x+)* and (x*)+ — mixed, still exponential
  • (a|a)+ or (a|ab)+ — alternation where branches overlap
  • (\s+)+, (\w+)* — the real-world disguises

The fix is almost always to remove the redundancy. The outer quantifier in ([a-zA-Z0-9]+)* does nothing the inner one can’t — [a-zA-Z0-9]+ already matches one or more characters. Drop the wrapper:

/^[a-zA-Z0-9]+@/

I reran the safe rewrite /^a+$/ against inputs up to 100,000 characters:

n=28       0.066 ms
n=1,000    0.010 ms
n=100,000  0.216 ms

Linear time. A hundred thousand characters finishes faster than the evil version handles 20. That’s the whole point — the pattern that looks more permissive is thousands of times faster because there’s only one way to match.

Why I test these in the browser, not on a server

Here’s the part that ties back to how RegexLab is built. To confirm a regex is vulnerable, you have to actually run it against a malicious input and watch it hang. If you do that in an online tester that processes patterns server-side, you’re either (a) DoS-ing their box, which is rude, or (b) hitting a timeout that hides the problem from you.

RegexLab runs the match with the native RegExp engine right in your tab. Nothing is uploaded. Your patterns — which for a lot of us encode business logic, internal formats, sometimes secrets baked into validation rules — never leave the machine. When a match hangs, it hangs your tab, which is exactly the feedback you want. You can feel the 3-second pause and know you found something. I wrote more about why I stopped trusting server-side dev tools with sensitive input in this piece on pasting data into online tools.

My workflow in RegexLab is simple:

  1. Paste the pattern.
  2. Add a normal test case — confirm it matches what it should.
  3. Add an evil case: a long run of the character class the pattern repeats, ending with something that forces a failed match.
  4. If the result is instant, you’re probably fine. If the tab stalls, you have a ReDoS.

The multi-case runner is handy here because you can keep the “good” input and the “attack” input side by side and re-run both after every edit to the pattern. I keep a small library of attack strings — 30 identical chars plus a mismatch — for exactly this. For a broader take on using the tool for security work, I wrote up regex patterns that catch real security bugs.

The structural fixes worth knowing

Rewriting to remove nested quantifiers covers most cases, but two other tools help:

Atomic groups and possessive quantifiers. These tell the engine “match this and never give it back,” which kills the backtracking. JavaScript didn’t support them for years, but modern V8 (Node 18+ / recent Chrome) does via (?>...) and a++. So /^(?>a+)+$/ won’t blow up. Check your runtime before relying on it — if you’re on an older engine it’ll throw a syntax error.

Switch engines for untrusted input. Rust’s regex crate and Go’s RE2 use a finite-automaton approach with no backtracking at all, so ReDoS is impossible by construction. The tradeoff is they drop backreferences and lookaround. If you’re validating user input at scale, that tradeoff is usually worth it. Google built RE2 for exactly this reason after backtracking engines kept taking down services.

If you want to go deep on how these engines actually differ, Jeffrey Friedl’s Mastering Regular Expressions is still the reference — it’s the book that made the backtracking-vs-automaton distinction click for me (affiliate link, full disclosure). Russ Cox’s free regexp article series covers the RE2 side if you prefer the theory online.

Test the regex you shipped last month

Pull up your codebase and grep for )+, )*, and any validation regex on an input field. Drop each one into RegexLab, hand it 30 repeated characters ending in a mismatch, and watch the clock. It takes about ten seconds per pattern, it runs entirely in your browser, and it might save you from a 2 AM page when someone finds the same field an attacker would.

The Cloudflare outage cost real money and made global news. The bug was one unbounded quantifier next to another. That’s a five-second check you can run right now.


Join https://t.me/alphasignal822 for free market intelligence.

📧 Get weekly insights on security, trading, and tech. No spam, unsubscribe anytime.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

Also by us: StartCaaS — AI Company OS · Hype2You — AI Tech Trends