Reverse-Engineering SEC EDGAR’s Full-Text Search API (efts.sec.gov)

Written by

in

The official SEC EDGAR full-text search box at efts.sec.gov is great if you’re a human clicking around. It’s useless if you want to pull 200 filings that mention “going concern” into a script. So I opened the network tab, watched what the search page actually calls, and rebuilt the request myself.

The page is a thin React front end. Every search fires a GET to https://efts.sec.gov/LATEST/search-index and gets back raw Elasticsearch JSON. No API key, no signup, no OAuth dance. Here’s the exact request that powers it, and the gotchas that cost me an afternoon.

The endpoint and its real parameters

The base URL is https://efts.sec.gov/LATEST/search-index. The path casing matters — /LATEST/ is uppercase and a lowercase /latest/ 404s. These are the query parameters that actually do something:

  • q — the search term. Wrap a phrase in URL-encoded double quotes (%22climate+risk%22) for an exact match, or it tokenizes into an OR search.
  • forms — comma-separated filing types: 10-K, 8-K, SC 13D, etc. Leave it off to search everything.
  • startdt and enddt — date bounds in YYYY-MM-DD. Both required if you want a window.
  • from — pagination offset. The page size is fixed at 10, so from=10 is page two, from=20 is page three.
  • ciks — restrict to a specific company by its zero-padded CIK number.

A complete request looks like this:

curl -s \
  -A "your-app [email protected]" \
  "https://efts.sec.gov/LATEST/search-index?q=%22machine+learning%22&forms=8-K&startdt=2026-01-01&enddt=2026-06-01"

The User-Agent header is not optional. SEC’s fair-access policy rejects requests with a generic or empty agent — you’ll get a 403. Put your app name and a contact email in there. I learned this the hard way after my first ten curls returned nothing but an HTML block page.

What comes back

The response is the Elasticsearch result envelope, untouched. The shape you care about:

{
  "took": 305,
  "hits": {
    "total": { "value": 662, "relation": "eq" },
    "hits": [
      {
        "_id": "0001193125-26-032000:ionq-ex99_2.htm",
        "_source": {
          "ciks": ["0001824920"],
          "display_names": ["IonQ, Inc.  (IONQ)  (CIK 0001824920)"],
          "root_forms": ["8-K"],
          "form": "8-K",
          "file_date": "2026-01-30",
          "adsh": "0001193125-26-032000",
          "file_type": "EX-99.2",
          "sics": ["7373"],
          "biz_states": ["MD"]
        }
      }
    ]
  }
}

Two fields unlock everything else. The _id is {accession}:{filename} — split on the colon and you can build a direct link to the document. The adsh is the accession number with dashes, which is what you feed into the rest of EDGAR’s data endpoints.

To turn a hit into a clickable filing URL, strip the dashes from the accession number for the folder path:

def filing_url(hit):
    adsh, fname = hit["_id"].split(":", 1)
    cik = int(hit["_source"]["ciks"][0])  # drops leading zeros
    folder = adsh.replace("-", "")
    return f"https://www.sec.gov/Archives/edgar/data/{cik}/{folder}/{fname}"

A real scraper that paginates

The 10-result page size is the one thing that trips people up. There’s no size parameter that the backend honors past 10 — I tried, it ignores it. You walk the result set with from instead. Here’s a small client that pulls every hit for a query and respects SEC’s rate limits:

import time
import requests

EFTS = "https://efts.sec.gov/LATEST/search-index"
HEADERS = {"User-Agent": "orthogonal-research [email protected]"}

def search_all(q, forms=None, startdt=None, enddt=None, max_results=100):
    results = []
    offset = 0
    while offset < max_results:
        params = {"q": q, "from": offset}
        if forms:   params["forms"] = forms
        if startdt: params["startdt"] = startdt
        if enddt:   params["enddt"] = enddt

        r = requests.get(EFTS, params=params, headers=HEADERS, timeout=15)
        r.raise_for_status()
        hits = r.json()["hits"]["hits"]
        if not hits:
            break
        results.extend(hits)
        offset += 10
        time.sleep(0.15)  # stay under ~10 req/sec
    return results

filings = search_all('"going concern"', forms="10-K",
                     startdt="2026-01-01", enddt="2026-06-01")
for f in filings:
    src = f["_source"]
    print(src["file_date"], src["form"], src["display_names"][0])

The time.sleep(0.15) keeps you under SEC’s documented limit of 10 requests per second. Go faster and you’ll get temporary IP blocks that last about ten minutes. There’s no X-RateLimit header to watch — the only signal is a sudden 403, so it’s better to throttle up front than to detect and back off.

The gotchas that cost me time

Phrase vs token search. A bare q=climate risk matches documents containing “climate” OR “risk” anywhere. That returned 40x more noise than I expected. The quoted form q=%22climate risk%22 is the exact phrase, and it’s what you almost always want.

The 10,000 result ceiling. Elasticsearch caps deep pagination. Once from passes 10,000 the endpoint errors out. If a query has more hits than that, narrow it with a tighter date range and stitch the windows together — there’s no scroll cursor exposed.

Full-text only covers 2001 onward. The full-text index starts in 2001. Older filings exist in EDGAR but won’t show up here. For anything pre-2001 you’re back to the structured submissions API.

It indexes exhibits, not just the main doc. A single 8-K can return several hits — one per attached exhibit. Dedupe on the accession number (adsh) if you only want one row per filing.

Where this fits

I use this as the front door for a few projects: a script that flags new 8-K filings mentioning specific risk language, and an insider-buying alerter that cross-references full-text hits against Form 4 data. The full-text endpoint finds the filings; the structured EDGAR APIs pull the details. Pair it with the congressional trade tracker approach and you’ve got a decent picture of who’s filing what.

If you want to go deeper on parsing the filings you find, two books earned their shelf space for me. Python for Data Analysis by Wes McKinney is the reference I keep open when I’m reshaping messy filing data with pandas. And for the finance side of reading what’s actually in these documents, Financial Statement Analysis and Security Valuation is dense but it’s the one I reach for. Full disclosure: those are affiliate links — they don’t change the price, and I only link books I actually own.

The whole thing is one undocumented GET request returning clean JSON. No key, no cost. The SEC quietly shipped one of the better free financial data APIs and never put a docs page on it.

A quick plug: I run Alpha Signal, a free Telegram channel where I post market structure and data-driven trade ideas built on exactly this kind of public-filing intelligence. Worth a look if SEC data is your thing.

📧 Get weekly insights on security, trading, and tech. No spam, unsubscribe anytime.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

Also by us: StartCaaS — AI Company OS · Hype2You — AI Tech Trends