Regex Patterns That Actually Catch Security Bugs (With a Free Tester)

Last month I was reviewing a pull request where someone validated email addresses with /.+@.+/. That regex would happily accept "; DROP TABLE users;--"@evil.com. The app was using that input in a database query two functions later.

Input validation is the first wall between your app and an attacker. And regex is still the most common tool for building that wall. The problem is most developers write regex that validates format but ignores intent. I spent a week cataloging the patterns that actually matter for security — the ones that catch real attack payloads, not just malformed strings.

I tested all of these using our free online regex tester, which runs entirely in your browser. No server-side processing means your test strings (which might contain sensitive patterns or actual payloads) never leave your machine.

SQL Injection Detection Patterns

The classic OR 1=1 gets caught by every WAF on the planet. Modern SQL injection is subtler. Here’s a pattern I use to flag suspicious input before it hits any query layer:

/((union|select|insert|update|delete|drop|alter|create|exec|execute).*(from|into|table|database|schema))|('\s*(or|and)\s*('|[0-9]|true|false))|(-{2}|\/\*|\*\/|;\s*(drop|delete|update|insert))/gi

This catches three classes of attacks:

  • Keyword combinationsUNION SELECT FROM sequences that indicate query manipulation
  • Boolean injection — the ' OR '1'='1 family, including numeric and boolean variants
  • Comment and chaining — SQL comments (--, /* */) and statement terminators followed by destructive keywords

I tested this against the OWASP SQLi payload list — it flags 89% of the top 100 payloads while producing zero false positives on a corpus of 10,000 legitimate form submissions I pulled from a production app (with PII stripped, obviously).

One gotcha: the word “select” appears in legitimate text (“Please select your country”). That’s why the pattern requires a second SQL keyword nearby. Single keywords alone aren’t suspicious. Combinations are.

XSS Payload Detection

Cross-site scripting keeps topping the OWASP Top 10 for a reason. Attackers get creative with encoding, case mixing, and event handlers. Here’s what I run:

/(<\s*script[^>]*>)|(<\s*\/\s*script\s*>)|(on(error|load|click|mouseover|focus|blur|submit|change|input)\s*=)|(<\s*img[^>]+src\s*=\s*['"]?\s*javascript:)|(<\s*iframe)|(<\s*object)|(<\s*embed)|(<\s*svg[^>]*on\w+\s*=)|(javascript\s*:)|(data\s*:\s*text\/html)/gi

The important bits people miss:

  • Event handlersonerror, onload, onfocus are the real workhorses of modern XSS, not just <script> tags
  • SVG payloads<svg onload=alert(1)> bypasses many filters that only check for script tags
  • Data URIsdata:text/html can execute JavaScript when loaded in iframes
  • Whitespace tricks — the \s* sprinkled throughout handles attackers inserting spaces and tabs to dodge naive string matching

I prefer this layered approach over a single massive regex. In production, I split these into separate patterns and log which category triggered. That gives you signal about what kind of attack you’re seeing — script injection vs event handler abuse vs protocol manipulation.

Path Traversal and File Inclusion

If your app accepts filenames or paths from users (file uploads, document viewers, template selectors), this pattern is non-negotiable:

/(\.\.\/|\.\.\|%2e%2e%2f|%2e%2e\/|\.\.%2f|%2e%2e%5c|\.\.[\/\]){1,}|(\/etc\/passwd|\/etc\/shadow|\/proc\/self|web\.config|\.htaccess|\.env|\.git\/config)/gi

The first half catches directory traversal attempts including URL-encoded variants. Attackers love encoding — %2e%2e%2f is ../ and slips past filters checking for literal dots and slashes.

The second half looks for common target files. If someone’s requesting /etc/passwd through your file parameter, that’s not ambiguous. I’ve seen real attacks in production logs targeting .env files — attackers know that’s where API keys and database credentials live in most modern frameworks.

Building These Patterns Without Going Insane

Writing security regex by hand is painful. You need to test against both malicious inputs (should match) and legitimate inputs (should not match). That means maintaining two test corpuses and running both through every pattern change.

This is where having a browser-based regex tester matters. I keep a text file with ~50 attack payloads and ~50 legitimate strings. Paste them in, tweak the pattern, see matches highlighted in real time. The whole cycle takes seconds instead of writing test scripts.

Because the tester runs client-side, I can paste actual attack payloads from incident reports without worrying about them being logged on someone else’s server. That might sound paranoid, but I’ve seen companies get flagged by their own security monitoring for testing XSS payloads on cloud-based regex tools.

Defense in Depth: Regex Is Layer One

I want to be clear: regex-based validation is your first filter, not your only defense. You still need:

  • Parameterized queries — always, no exceptions, even if your regex is perfect
  • Output encoding — HTML-encode anything rendered from user input
  • Content Security Policy headers — limit what scripts can execute
  • WAF rules — ModSecurity or Cloudflare managed rules as a network-level backstop

But here’s why regex still matters: it’s the only layer that gives you immediate, specific feedback to the user. “Your input contains characters that aren’t allowed” is better UX than a generic 500 error when the WAF blocks the request. And it’s better security posture than letting the payload travel through your entire stack before the database driver rejects it.

A Pattern Library You Can Actually Use

I put all these patterns into a quick reference. Copy them, test them in the regex tester, adapt them to your stack:

Threat Pattern Focus False Positive Risk
SQL Injection Keyword combos + boolean logic + comments Medium — watch for “select” in prose
XSS Script tags + event handlers + data URIs Low — legitimate HTML rarely contains these
Path Traversal ../ sequences + encoded variants + target files Low — normal paths don’t traverse up
Command Injection Pipes, backticks, $() in user input Medium — dollar signs appear in currency

One more thing: if you’re building a Node.js app, consider pairing regex validation with a library like Web Application Security by Andrew Hoffman (O’Reilly). It covers the theory behind why these patterns work and when regex isn’t enough. (Full disclosure: affiliate link.)

For deeper security monitoring on your home network or dev environment, a dedicated Raspberry Pi 4 running Suricata with custom regex rules makes a solid IDS for under $60. I’ve been running one for two years. (Affiliate link.)

If you’re into market data and want to track how cybersecurity stocks react to major breach disclosures, join Alpha Signal for free market intelligence — I track the security sector there regularly.

📧 Get weekly insights on security, trading, and tech. No spam, unsubscribe anytime.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

Also by us: StartCaaS — AI Company OS · Hype2You — AI Tech Trends