I wanted a quick answer to a boring question: what was NVIDIA’s gross margin last fiscal year, straight from the filing, no scraped-together blog number I have to trust? Most people open a stock site and read whatever it shows. I wanted the figure that came out of the actual 10-K NVIDIA submitted to the SEC, because that is the number nobody can fudge.
It turns out the SEC EDGAR XBRL API gives you exactly that, as clean JSON, for free, with no API key. Its companyfacts endpoint returns every US public company’s financial statements, and almost nobody outside of fintech talks about it. Every US public company’s financial statements, tagged concept by concept, one HTTP GET away. I built a three-line screener on top of it, hit one genuinely nasty gotcha that gave me a 570% gross margin, and figured out the fix. Here is the whole thing.
The endpoint nobody mentions
When developers think “SEC data” they think of downloading 10-K HTML and regex-ing their way through tables. You do not have to. Since 2009 the SEC has required companies to tag their financials in XBRL, and data.sec.gov exposes that tagged data as JSON.
There are three endpoints. The first maps a ticker to a CIK (the SEC’s internal company ID), which you need for everything else:
curl -s "https://www.sec.gov/files/company_tickers.json" \
-H "User-Agent: Sample Company [email protected]"
That returns a dictionary keyed by row number. Apple looks like this:
{"2":{"cik_str":320193,"ticker":"AAPL","title":"Apple Inc."}}
Note the CIK is an integer there, but every other endpoint wants it zero-padded to 10 digits: 0000320193. That mismatch is the first thing that trips people up.
Pulling one number: companyconcept
If you know the exact metric you want, companyconcept gives you the full history of a single XBRL tag for one company. Here is Apple’s net income:
curl -s \
"https://data.sec.gov/api/xbrl/companyconcept/CIK0000320193/us-gaap/NetIncomeLoss.json" \
-H "User-Agent: Sample Company [email protected]"
You get back every reported value for that concept, each with the period, the form it came from (10-K, 10-Q), and the filing date. Filter to annual 10-K figures and Apple’s net income history falls straight out:
FY 2021-09-25: $94.68B
FY 2022-09-24: $99.80B
FY 2023-09-30: $97.00B
FY 2024-09-28: $93.74B
FY 2025-09-27: $112.01B
That last figure, $112.01B for fiscal 2025, was filed 2025-10-31. It is the real audited number, not a consensus estimate. I checked it against the 10-K and it matches to the dollar.
The User-Agent rule that will 403 you
The SEC blocks any request without a descriptive User-Agent. This is not optional and it is not the usual Cloudflare bot check. Leave it off and you get a hard 403 every time:
$ curl -s "https://data.sec.gov/api/xbrl/companyconcept/CIK0000320193/us-gaap/NetIncomeLoss.json" -o /dev/null -w "%{http_code}\n"
403
$ curl -s "...same url..." -H "User-Agent: Sample Company [email protected]" -o /dev/null -w "%{http_code}\n"
200
The SEC’s fair-access policy asks you to send your app name and a contact email, and to stay under 10 requests per second. I have never been rate-limited staying well below that. Send a real contact string; they do occasionally email people who hammer the endpoint.
A gross-margin screener in ~20 lines
The companyfacts endpoint returns every tagged concept for a company in one blob. Apple’s is 3.7 MB and holds 503 distinct us-gaap concepts. For a screener I prefer companyconcept so I only pull what I need. Here is my first attempt at a revenue-and-gross-margin screen across three names:
import json, urllib.request, time
UA = "Sample Company [email protected]"
def get(url):
req = urllib.request.Request(url, headers={"User-Agent": UA})
return json.load(urllib.request.urlopen(req))
tk = get("https://www.sec.gov/files/company_tickers.json")
cik = {v["ticker"]: str(v["cik_str"]).zfill(10) for v in tk.values()}
def latest_annual(concept):
usd = concept["units"]["USD"]
tens = [x for x in usd if x.get("form") == "10-K"]
return max(tens, key=lambda r: r["end"])
for t in ["AAPL", "MSFT", "NVDA"]:
c = cik[t]
rev = get(f"https://data.sec.gov/api/xbrl/companyconcept/CIK{c}/us-gaap/RevenueFromContractWithCustomerExcludingAssessedTax.json")
gp = get(f"https://data.sec.gov/api/xbrl/companyconcept/CIK{c}/us-gaap/GrossProfit.json")
r, g = latest_annual(rev), latest_annual(gp)
print(f"{t}: revenue ${r['val']/1e9:.1f}B gross margin {g['val']/r['val']*100:.1f}%")
time.sleep(0.2)
Run it and two of the three lines are perfect. The third is nonsense:
AAPL: revenue $416.2B gross margin 46.9%
MSFT: revenue $281.7B gross margin 68.8%
NVDA: revenue $26.9B gross margin 570.2%
A 570% gross margin is impossible. And NVIDIA did not do $26.9B in revenue last year, it did over $200B. So what broke?
The gotcha: companies change their revenue tag
This is the part that will bite anyone building on XBRL, and it is why you cannot hardcode one concept name and walk away. The us-gaap taxonomy has several tags that all mean “revenue,” and companies switch between them.
NVIDIA used to tag revenue as RevenueFromContractWithCustomerExcludingAssessedTax. Then it switched to the plain Revenues tag. So the old concept is frozen in time at its last reported value, fiscal 2022’s $26.9B, while current gross profit keeps climbing. Divide today’s $153B gross profit by 2022’s stale $26.9B revenue and you get that absurd 570%.
You can see the split clearly if you ask NVIDIA’s companyfacts which revenue concepts it carries and what the latest annual value is for each:
Revenues: 2026-01-25 $215.9B
RevenueFromContractWithCustomerExcludingAssessedTax: 2022-01-30 $26.9B <- frozen
GrossProfit: 2026-01-25 $153.5B
The fix is to try a priority list of revenue concepts and, critically, match revenue and gross profit on the same period end rather than just taking the latest of each:
REVENUE_CONCEPTS = [
"RevenueFromContractWithCustomerExcludingAssessedTax",
"Revenues",
"SalesRevenueNet",
]
def annual_points(cik, concept):
try:
d = get(f"https://data.sec.gov/api/xbrl/companyconcept/CIK{cik}/us-gaap/{concept}.json")
except Exception:
return {}
out = {}
for x in d["units"].get("USD", []):
# keep only full-year 10-K periods (~365 days), keyed by period end
if x.get("form") == "10-K" and (int(x["end"][:4]) - int(x["start"][:4])) == 1:
out[x["end"]] = x["val"]
return out
def latest_revenue(cik):
merged = {}
for concept in REVENUE_CONCEPTS:
for end, val in annual_points(cik, concept).items():
merged.setdefault(end, val) # first concept that reports a period wins
end = max(merged)
return end, merged[end]
Now pull gross profit for that exact same period end and the margin math is honest:
AAPL: FY end 2025-09-27 revenue $416.2B gross margin 46.9%
MSFT: FY end 2025-06-30 revenue $281.7B gross margin 68.8%
NVDA: FY end 2026-01-25 revenue $215.9B gross margin 71.1%
71.1% for NVIDIA. That is the real number, and it lines up with what the company reports in its own filings. Notice the fiscal years do not align, either: Apple ends in September, Microsoft in June, NVIDIA in late January. If you are comparing companies you have to respect that, which is exactly why keying on period end matters.
The frames endpoint: every company at once
The third endpoint is the one that changes what you can build. frames returns one concept for one period across every filer that reported it. Want net income for calendar year 2024 for all of corporate America?
curl -s \
"https://data.sec.gov/api/xbrl/frames/us-gaap/NetIncomeLoss/USD/CY2024.json" \
-H "User-Agent: Sample Company [email protected]"
That single call came back with 6,018 companies in one 918 KB response. No looping over tickers, no rate-limit dance. You get a flat array you can drop into pandas and sort, rank, or filter however you like. This is how you build a real screener: pull the frame once, join concepts on CIK, done.
One caveat on frames: the CY2024 style period aligns to calendar quarters, so companies with off-calendar fiscal years may land in an adjacent frame or get dropped from a given quarter. For point-in-time cross-sectional screens it is fine; for precise per-company history, go back to companyconcept.
Where XBRL data actually bites
After building a few things on this, here is what I would tell anyone starting out:
- Concept names drift. The NVIDIA revenue-tag switch is not rare. Always keep a priority list per metric and match on period.
- Restatements exist. The same period can appear more than once with different values across filings. The most recently filed one is usually what you want, so sort by the
fileddate when a period collides. - Quarterly vs annual. A 10-Q “revenue” is three months; a 10-K is twelve. Filter on the period length (the difference between
startandend) or you will add quarters to years. - Not every company tags everything. Smaller filers skip concepts. Wrap lookups in a try/except and treat missing as null, do not crash the screen.
None of this is in a tidy tutorial anywhere, which is why I am writing it down. The data is genuinely good once you respect its quirks. If you want the deeper mechanics of how EDGAR’s search side works, I pulled apart its full-text search API in this earlier teardown, and if you are chasing congressional trades rather than fundamentals, here is how to pull House disclosures directly.
Books that make the numbers mean something
Pulling the data is the easy half. Knowing whether a 71% gross margin or a shrinking net-income trend actually matters is the hard half, and that is domain knowledge, not code. Three books I keep on the shelf for exactly this (full disclosure: these are Amazon affiliate links):
- Financial Statement Analysis: A Practitioner’s Guide β the one I reach for when a tag’s meaning is ambiguous. It maps line items to what they signal about the business.
- Financial Statements by Thomas Ittelson β the clearest plain-English walkthrough of how the three statements connect. Read this before you write a screener and the XBRL concepts stop looking like alphabet soup.
- Python for Finance by Yves Hilpisch β if you want to take these JSON pulls into real analysis with pandas, this is the reference.
Total cost of the data itself: zero. No Bloomberg terminal, no $2,000/month vendor feed, no scraping fragile HTML. Just the filings companies are legally required to submit, in a format built for machines.
If you found this useful, I write about market data pipelines and trading tooling regularly. Join https://t.me/alphasignal822 for free market intelligence.
π§ Get weekly insights on security, trading, and tech. No spam, unsubscribe anytime.
Leave a Reply