AI in Finance Part 2: Reading 10-Ks

AI analyzing 10-K filings to extract predictive constraints

This is Part 2 in a series on how we used LLMs to do reliable financial research — see Part 1 on earnings releases

Compared to earnings releases, 10ks are really long, often 100+ pages. This made it even more risky for us to use an LLM to summarize them for stock research, but it's such a timesaver, I, and presumably many other people, did it anyway and will continue to do so.

You may have already discovered that if you simply ask ChatGPT or Gemini to summarize a 10-K, it produces an excellent seeming summary. My closer reads, and doing this on every company in the S&P 500, confirmed what you might suspect: LLMs miss critical info I wouldn't miss, and LLMs include a bunch of useless info I wouldn't include.

(With the use case here being long-term fundamental stock forecasting. If you are interested in something other than valuing companies, you may have a different experience.)

As we did with earnings releases, I correlated our LLM-generated 10-K analyses against high quality long-term forecasts of company fundamentals for all S&P 500 companies to see what was useful. I think ran a simple LLM pipeline to pick out what was most useful in the forecasting. And I foudn these patterns.

What Worked For Me

I found LLMs summaries of 10-Ks gravitate toward business descriptions, strategy statements, and ESG disclosures, which rarely matter for our forecasts of revenues, margins, and payouts
LLMs don't reliably include seemingly-boring info that I would flag is important, in seven categories covered below
I did save hours of 10-K reading by using LLMs, but only after carefully setting up a process, which I explain at the end.

What I want LLMs To never miss in 10-K findings

Category 1: Regulatory Constraints With Dates and Dollars

Vague regulatory mentions ("subject to regulation") are worthless. But constraints with dates ("rate moratorium through September 2029") can fit right into my long-term cash-flow models.

My analysis showed this was found in 12 of 50 companies.

Example: Alliant Energy's $11B capital program hits 4-year rate freeze

When Alliant Energy disclosed their Iowa subsidiary operates under a retail electric base rate moratorium from October 2025 through September 2029—during the peak of an $11 billion capital program—this transformed a vague "rate agreement" into a modelable constraint. The specific end date enabled forecasting margin pressure through 2029 followed by recovery in 2030. Without these details, analysts might project either sustained margin expansion or permanent compression, missing the temporary nature of the constraint.

Category 2: Legal Exposure With Quantified Ranges

I noticed that earnings calls so often say "involved in litigation." 10-Ks say "$452M sought plus 40 similar pending claims with $221M insurance remaining." The dollar amounts tell me if I should care, whereas LLMs usually pick up everything like this.

Found in 15 of 50 companies.

Example: Universal Health Services's $180M verdicts with 40 more claims and shrinking coverage

Universal Health Services disclosed two massive jury verdicts totaling $180M with approximately 40 additional similar claims pending, while only $221M in insurance coverage remained for the relevant period. New policies beginning March 2025 exclude coverage for sexual abuse claims. This pattern—one large adverse verdict plus many similar pending claims—signals a systemic problem likely to recur. The specific quantification enabled modeling 50-150 basis points of direct margin impact plus elevated risk premiums.

Category 3: Customer/Supplier Concentration With Trends

I want trends over time, not just just percentages. "69% from Apple, up from 58%" tells me a different story than "significant customer concentration."

Found in 18 of 50 companies.

Example: For Skyworks Solutions, Apple concentration grows to 69% despite diversification strategy

Skyworks Solutions disclosed Apple revenue concentration increased from 58% (FY2022) to 69% (FY2024)—contradicting their stated strategic goal of diversification. This multi-year trend, only visible in the 10-K's three-year comparison tables, revealed the strategy was failing. Similarly, Supermicro showed one customer represented 20% of sales but 44.8% of accounts receivable, signaling potential payment risk beyond the revenue concentration.

Category 4: Segments, Not Averages

I find big averages not helpful, but LLMs always report them. Whereas LLMs skip the segments sometimes, but that always turns out to be what I end up using in the final forecast, e.g.when low-margin segments grow faster than high-margin segments?

Found in 22 of 50 companies.

Example: Qualcomm's 71% margin licensing shrinks as 30% margin chips grow

Qualcomm's 10-K revealed the QTL licensing segment has 71% operating margins versus 30% for QCT chips. As the chip business grows faster, blended margins compress despite operational improvements in both segments. TKO Group showed the IMG acquisition delivers 26% EBITDA margins versus 39% for core UFC/WWE. These segment-level details—especially multi-year trends—enabled forecasting margin trajectories as business mix evolves, typically driving 100-300 basis points difference in long-term margin forecasts.

Category 5: Cash Conversion

Revenue growth without cash generation doesn't mean anything. I need LLMs to spot 10-K cash flow details.

Found in 14 of 50 companies.

Example: Supermicro's 110% revenue growth consumes $2.5B cash instead of generating it

Supermicro's 10-K revealed that despite 110% revenue growth, the company consumed $2.5 billion in operating cash flow (versus generating $664 million the prior year). The detailed cash flow statement showed $3.0 billion in inventory buildup and $1.3 billion in accounts receivable increases. This signaled either unsustainable growth requiring continuous external funding or quality of revenue issues. These patterns enabled forecasting significant growth moderation as working capital constraints emerge, while also influencing payout ratio forecasts by 10-30 percentage points.

Category 6: Reserves & Insurance

I find that insurance markets and reserve trends predict problems before they hit the P&L. Is the company doing more self-insurance or coverage exclusions? Looks like random boilerplate to LLMs, looks like signal to me.

Found in 11 of 50 companies.

Example: Universal Health Services's self-insurance reserves triple to $79M as coverage excluded

Universal Health Services' self-insurance reserves increased from $25M to $79M, while new policies excluded sexual abuse coverage. RTX showed a $674M increase in asset retirement obligations from revised EPA rules. These are leading indicators of future margin pressure—companies and their insurers know about problems before they hit the income statement. Reserve trends typically signal 50-150 basis point margin changes and reveal operational quality issues.

Category 7: Targets vs. Reality

I definitely don't want to miss multi-year comparisons of management targets vs. actual results reveals reliability. (Actually, I usually compare 10K-s against past 10-Ks, but sometimes a single 10-K has what I need.)

Found in 19 of 50 companies.

Truist Financial: Targets 18-20% return, delivers 12.3%

Truist Financial targeted 18-20% return on tangible common equity but delivered 12.3% in 2024. Skyworks aimed for diversification but saw customer concentration increase. These multi-year gaps between stated targets and actual results—only visible by comparing the current 10-K's MD&A against prior year targets—reveal management capability and strategic execution. This assessment typically results in 5-15 percentile point reductions in reliability scores and wider forecast ranges.

What I want LLMs to ignore

I don't even read information in these categories anymore. Instructing the LLM to ignore them means it has more context window & tokens to focus on what I do care about.

Boilerplate Risk Factors. "We face competition" appears in every 10-K. Exception: New risk factors or substantially expanded discussion of previous boilerplate warrant attention.
Business Descriptions and Product Overviews. The information is available elsewhere and rarely changes.
Generic ESG Disclosures. There's a lot of this, it never ends up mattering. Exception: ESG creating real costs (carbon taxes) or benefits (tax credits).
Management & Board Bios. Why is this even included? Prompts about following the key actors can pick up a bunch of boilerplate stuff. I only care what managers do, not who they are.
Historical Summaries Beyond 3 Years. Anything beyond that is too broad to have anything new.
Forward-Looking Statement Safe Harbors. LLMs don't always know this is pure legal protection with no information value.
"Competitive Landscapes", like "We compete on quality, price, and service" tells me nothing.
Accounting Policy Descriptions — Technical details rarely change year-over-year unless there's been a policy change or new standard adoption, I might be missing stuff here but I don't want LLMs to bring this to me.

My recipe

In the previous post on earnings releases, I recommended a single high quality prompt with the right documents as context.

I am less sure this is optimal for 10_Ks. 10-Ks are so long that doing multiple passes through, looking for different types of information, may outperform a single prompt.

If you do run a series of prompts on a 10-K, here's what I recommend.

Also, as for passing context, unlike with earnings releases, 10-Ks often have enough historical data that adding additional filings, such as the previous year's 10-K, as context may not be as valuable. In our research, we sometimes provided more, and sometimes not, with unclear results.

Prompts

10-Ks are 200+ pages of dense technical information. General summaries lose critical details. Instead, use focused extraction prompts for each category.

Regulatory Constraints

Extract all regulatory constraints, rate agreements, export controls,
licensing requirements, consent decrees that have dates and/or dollar amounts. For each: (1) specific constraint,
(2) dollar impact if quantified, (3) start/end dates, (4) what triggers
changes. Flag constraints >2 years or >$500M.

Legal Exposure

Extract legal proceedings you deem as material that have numbers. Ignore 'ordinary course' language.
For each material case: plaintiff claims, dollar amounts sought, number
of similar pending cases, insurance coverage amounts, reserve amounts.
Flag multiple similar claims or insurance gaps.

Concentration Trends

Extract customer/supplier/geographic concentration tables for current
and prior 2 years. Calculate year-over-year changes. Flag concentration
>50% or increasing when strategy claims diversification.

Segment Economics

Extract segment revenue, operating income, margins for each segment over
3 years. Calculate margin by segment and growth rate by segment. Flag
low-margin segments growing faster than high-margin segments.

Cash Flow Quality

Extract operating cash flow, changes in working capital components
(AR, inventory, AP) over 3 years. Calculate cash flow/net income ratio
and working capital as % of revenue. Flag cash consumed despite
earnings growth.

Reserve Trends

Extract all reserve rollforwards (legal, warranty, environmental,
self-insurance, loan loss). For each: beginning balance, additions,
releases, ending balance. Extract insurance policy changes. Flag reserves
building faster than business growth or new coverage exclusions.

Accounting Changes

Extract significant accounting policies section. Identify changes in
useful lives, capitalization policies, reserve methodologies. Calculate
dollar impact on earnings. Flag any changes that increased earnings.

Targets vs. Actuals

Extract forward-looking targets stated in MD&A. Compare stated targets
vs. actual results. Calculate gap for each metric. Flag targets missed
by >20% or consistently over multiple years.

Coming next: other documents.

A lot of key information is in multiple filings. Here's what I've found is uniquely in 10-Ks, as judged by asking an LLM to read the final forecast I produce, then trace back the facts that uniquely appeared in 10-Ks.

More than anything, I find 10-Ks useful for identifying binding constraints with specific dates (rate moratoriums, regulatory deadlines, contractual obligations). Also, I do management credibility assessments entirely by comparing 10-Ks across time. Earnings calls discuss what management wants to emphasize, quarterly reports provide updates, but only the 10-K force the managers to disclose the complete picture.

So it's worth reading these 10-Ks, as Buffett always says. And if you want to do this faster using LLMs, it does work, but be very careful. It's easy to get fooled by what looks like an amazing summary.