6 min read

How I Built a Chrome Extension That Reads SEC Filings

Technical walkthrough of FinSight: Chrome MV3 architecture, content script injection, IndexedDB caching, and why I made users bring their own Claude API key.

Chrome ExtensionTypeScriptReactClaude AIFinTech

SEC filings are public. Every 10-K, 10-Q, 8-K, proxy statement — all free on EDGAR. Retail investors have access to the same raw data as hedge funds.

The problem is the data is unusable. A typical 10-K is 150-300 pages of legal and accounting language. “Deferred tax liabilities arising from temporary differences between the book and tax basis of property…” This is information, technically. It is not insight.

I built FinSight to close the gap. Here’s how.

The Architecture

A Chrome extension has a specific process model that’s worth understanding before you design anything. Manifest V3 (which I committed to from the start) has three execution contexts:

Service Worker — Runs in the background. No DOM access. Handles API calls, message routing, and caching. In MV3, service workers are non-persistent — they get killed when idle and must handle being restarted mid-operation. This forced me to think carefully about what state needs to be persisted (everything important) vs. kept in memory (nothing critical).

Content Script — Injected into the active browser tab. Has DOM access but runs in an isolated JavaScript context. This is where the page detection, term highlighting, and section extraction happen.

Side Panel / Popup — The React UI. Communicates with the service worker via Chrome’s message passing API. Shows summaries, risk analysis, term definitions.

The three contexts can’t share memory directly. Everything passes through chrome.runtime.sendMessage() and chrome.tabs.sendMessage(). Getting this communication pattern right took a few iterations — the mental model is different from normal web development.

Content Script: What Actually Runs on the Page

When a user visits any URL matching *://www.sec.gov/Archives/edgar/*, my content script loads. First thing it does: verify the page actually contains a filing (some EDGAR pages are indexes, not documents). I check for specific HTML patterns and filing-type indicators in the page text.

If it’s a real filing, the content script:

  1. Scans the full text of the page for ~500 predefined financial terms
  2. Wraps each match in a <span> with a hover tooltip
  3. Injects floating “Analyze” buttons near major section headings (Risk Factors, MD&A, Financial Statements)
  4. Listens for user clicks on those buttons

The term dictionary is bundled JSON — no API call needed. When a user hovers over “goodwill impairment,” they get an instant definition from local data. The only API calls are for summarization and AI analysis.

This is an important UX decision: instant response for common actions, API call only for expensive ones. If term tooltips required API calls, every hover would cost money and feel slow. Bundled dictionary means every hover is instantaneous.

The Caching Problem

SEC filings don’t change. A 10-K filed on November 15th looks the same today as it will six months from now. Calling Claude to analyze the same filing twice is wasteful and costs the user money.

I implemented an IndexedDB cache with a 7-day TTL. Every analysis result — summary, risk flags, key metrics, comparative analysis — gets stored keyed by filing URL and section hash.

Why IndexedDB over localStorage?

  • localStorage is synchronous and has a 5MB limit — too small for storing multiple full filing analyses
  • IndexedDB is async and supports significantly more storage
  • MV3 service workers can access IndexedDB (unlike the old MV2 persistent background page, where localStorage worked fine)

The cache check happens before any API call. If you analyze a filing’s Risk Factors section on Monday, revisit on Wednesday, and click “Analyze Risk Factors” again — you get the cached result instantly. No API call, no cost, no wait.

The tradeoff: the cache is local and doesn’t sync across devices. If you analyze something on your laptop and then open the same filing on a different machine, you start fresh. I accepted this for now — privacy first, cross-device sync is a later feature.

Why I Committed to Manifest V3

MV3 is more restrictive than MV2. No persistent background pages, stricter content security policies, tighter permission requirements. A lot of extension developers were still shipping MV2 when I started building FinSight.

I committed to MV3 anyway because MV2 is deprecated. Chrome stopped allowing new MV2 extension submissions to the Web Store in June 2024. Existing MV2 extensions get warnings. Building on MV2 in 2025 means building on a foundation that’s actively being removed.

The MV3 constraints also forced better architecture. Non-persistent service workers mean you can’t lazily store state in memory — you have to be explicit about what gets persisted. This is annoying during development and correct for production. Extensions with implicit in-memory state are a mess to debug.

The one real pain point: Chrome’s MV3 extension review process was longer than expected. The “host permissions” for EDGAR URLs required a specific justification. Worth knowing if you build extensions targeting specific domains.

The Claude API Integration

FinSight uses Claude claude-3-5-sonnet for two tasks:

  1. Summarization: Given the full text of a filing section, produce a structured summary with key facts, metrics, and management tone assessment
  2. Risk analysis: Given the Risk Factors section, identify top 3 risks with severity and specific quoted evidence

I wrote the prompts with explicit output schemas — JSON with specific fields. This made parsing reliable and let me display results in structured UI components rather than rendering raw text.

One architectural decision: users must provide their own Anthropic API key. The extension stores it encrypted in Chrome’s local storage. This means:

  • No backend infrastructure required (no servers, no auth, no billing)
  • User controls their own API costs
  • No risk of key theft from my backend

Higher setup friction? Yes. But for a v1 product aimed at investors who are comfortable reading SEC filings, providing an API key is a reasonable hurdle. The users who get through setup are the users who actually want the product.

What Building This Taught Me

Chrome extension APIs are powerful but byzantine. The permission model (activeTab, storage, sidePanel, declarativeNetRequest), message passing patterns, and the distinction between content script context vs. extension context — all of this is documented but the mental model takes time to internalize. The first week was mostly reading Chrome extension docs.

Financial domain requires calibration. Generic LLM prompts produce generic output. “Summarize this filing” produces vague summaries. I needed specific prompts: “Extract specific dollar figures for revenue, operating income, and net income. Flag any year-over-year decreases greater than 10%. Identify going-concern language or auditor change disclosures.” Domain knowledge shapes prompt engineering.

Local caching is a product decision, not an implementation detail. Users who get instant results (from cache) feel like the product is fast and smart. Users who wait 10 seconds every time they analyze a section feel like it’s slow. The cache is the experience.

IndexedDB migrations are real. When I changed the cache schema (added a new field to the analysis result), existing cached data broke. I had to implement versioned migrations. This is the kind of thing you don’t think about until you’re looking at a cached result that doesn’t match your TypeScript types.

The extension works. Install it, navigate to any EDGAR filing, and you get instant term definitions on hover, AI summaries with a click, and risk flags pulled directly from the text. That’s what it’s supposed to do.