Last October, I noticed three directors at a mid-cap biotech all bought shares within the same week. The stock was down 40% from its high. Two months later it doubled on a pipeline update. I didn’t trade it — I found out from a Twitter thread three days after the filings hit EDGAR.
That stung enough to make me build something. SEC Form 4 filings — the ones insiders must file within two business days of buying or selling company stock — are public data. They’re free. They hit EDGAR before any news outlet picks them up. And parsing them with Python takes about 80 lines of code.
Why Insider Buying Actually Matters
Insider selling is noise. Executives sell for a hundred reasons — taxes, divorce, a new house. But insider buying with personal money? There’s exactly one reason: they think the stock is going up.
Academic research backs this up. A study by Nejat Seyhun (University of Michigan) found that stocks with cluster insider buying — three or more insiders purchasing within a 30-day window — outperformed the market by 7-13% annually over the following 12 months. That’s not a small edge.
The challenge is timing. Form 4 filings hit EDGAR at unpredictable times. Some land at 4:01 PM, others at 2 AM. If you’re checking manually, you’re always late.
The EDGAR Full-Text Search API
SEC opened up their EDGAR full-text search API (EFTS) in 2023. No API key required — just a proper User-Agent header with your email. Rate limit is 10 requests per second, which is generous.
Here’s the endpoint that matters:
GET https://efts.sec.gov/LATEST/search-index?q=%224%22&dateRange=custom&startdt=2026-05-26&enddt=2026-05-27&forms=4
But raw search is messy. The better approach: hit the recent filings RSS feed for Form 4s, then parse each SGML/XML filing for the transaction details.
The Script: 80 Lines That Do the Work
import requests
import xml.etree.ElementTree as ET
from datetime import datetime, timedelta
import time
HEADERS = {
"User-Agent": "YourName [email protected]",
"Accept-Encoding": "gzip, deflate"
}
def get_recent_form4s(hours_back=4):
"""Fetch recent Form 4 filings from EDGAR."""
url = "https://www.sec.gov/cgi-bin/browse-edgar"
params = {
"action": "getcurrent",
"type": "4",
"dateb": "",
"owner": "include",
"count": 40,
"search_text": "",
"output": "atom"
}
resp = requests.get(url, params=params, headers=HEADERS)
return resp.text
def parse_form4_xml(filing_url):
"""Parse a Form 4 XML filing for purchase transactions."""
resp = requests.get(filing_url, headers=HEADERS)
time.sleep(0.15) # respect rate limits
try:
root = ET.fromstring(resp.text)
except ET.ParseError:
return None
ns = {'o': 'http://www.sec.gov/cgi-bin/viewer?action=view&cik=...'}
# Form 4 XML uses default namespace
issuer = root.find('.//issuerName')
insider = root.find('.//rptOwnerName//value')
transactions = []
for txn in root.findall('.//nonDerivativeTransaction'):
code = txn.find('.//transactionCode//value')
shares = txn.find('.//transactionShares//value')
price = txn.find('.//transactionPricePerShare//value')
if code is not None and code.text == 'P': # P = Purchase
transactions.append({
'issuer': issuer.text if issuer is not None else 'Unknown',
'insider': insider.text if insider is not None else 'Unknown',
'shares': shares.text if shares is not None else '0',
'price': price.text if price is not None else '0'
})
return transactions
The key filter: transactionCode == 'P'. That’s a direct open-market purchase. Ignore ‘A’ (grant/award), ‘M’ (exercise), ‘S’ (sale). You only want executives spending their own cash.
Detecting Cluster Buying
A single insider buying $50K of stock might mean nothing. But when the CFO, a board member, and the VP of Engineering all buy within two weeks? That’s a cluster, and historically it’s the strongest signal.
from collections import defaultdict
def detect_clusters(purchases, window_days=30, min_insiders=3):
"""Find stocks with multiple insiders buying in a window."""
by_ticker = defaultdict(list)
for p in purchases:
by_ticker[p['ticker']].append(p)
clusters = []
for ticker, buys in by_ticker.items():
buys.sort(key=lambda x: x['date'])
unique_insiders = set()
for i, buy in enumerate(buys):
window_start = buy['date']
window_end = window_start + timedelta(days=window_days)
insiders_in_window = set(
b['insider'] for b in buys
if window_start <= b['date'] <= window_end
)
if len(insiders_in_window) >= min_insiders:
clusters.append({
'ticker': ticker,
'insiders': insiders_in_window,
'total_value': sum(
float(b['shares']) * float(b['price'])
for b in buys
if window_start <= b['date'] <= window_end
)
})
break # one cluster per ticker
return clusters
Running It on a Schedule
I run this every 4 hours via cron on my home server (a Beelink mini PC that draws 15W — perfect for always-on scripts). When a cluster is detected, it pushes a notification through ntfy:
# crontab -e
0 */4 * * * python3 /home/scripts/insider_scanner.py 2>&1 | logger -t insider
Total infrastructure cost: $0. SEC data is free. Python is free. The mini PC was a one-time purchase I already had running my homelab.
What I’ve Learned Running This for 7 Months
Some patterns that showed up in my data:
- Cluster buys after 30%+ drawdowns are the highest-signal events. Insiders buying the dip with conviction usually know something about the recovery timeline.
- Dollar amount matters more than share count. A CEO buying $2M of stock is more meaningful than a director buying $15K. I filter for purchases over $100K per insider.
- Ignore scheduled 10b5-1 plan purchases. These are pre-programmed and carry no informational value. Check the footnotes in the filing — they’ll mention if it’s a plan purchase.
- Biotech and small-cap clusters have the highest hit rate in my experience. Large-caps move too slowly for this to give you an edge.
Limitations (Be Honest With Yourself)
This isn’t a magic money printer. Insider buying signals work on a 3-12 month horizon, not days. You need patience. Some clusters lead nowhere — maybe the insiders were wrong, or a macro event overwhelmed company fundamentals.
I use this as one input into a broader process, not as a standalone trading strategy. It’s particularly useful as a “where to look” filter — when I see a cluster, I dig into the company’s fundamentals, recent earnings calls, and technical setup before deciding anything.
If you want to go deeper on systematic approaches to market signals, I publish daily analysis (narrative detection, sector rotation, macro scoring) in my free Telegram channel. No fluff, just data: join Alpha Signal here.
Full Source and Next Steps
The complete script with ntfy notifications, cluster detection, and a SQLite database for historical tracking is about 200 lines. I’d recommend Python for Data Analysis by Wes McKinney if you want to extend this with pandas for backtesting the signals against price data.
Key improvements I’m still working on:
- Cross-referencing cluster buys with free stock APIs to auto-pull the price chart context
- Filtering out Form 4 amendments (they duplicate the original filing)
- Adding options grant exercises that immediately convert to holds — these are sometimes disguised conviction signals
The SEC gives you the data for free. The edge is just showing up before everyone else reads about it on Twitter.
📧 Get weekly insights on security, trading, and tech. No spam, unsubscribe anytime.
Leave a Reply