What is a token - and why is it suddenly showing up on your P&L?

What is a token - and why is it suddenly showing up on your P&L?
AI costs were invisible when everything ran on a flat subscription. They're not invisible anymore. Here's what tokens actually are, what your common advisory tasks consume, and how to model the real cost before it surprises you mid-year.
JUN 22, 2026

The short version

A token is roughly three-quarters of a word. The sentence "the fund's expense ratio is 0.65 percent" contains nine words and approximately twelve tokens. Every time you or a staff member sends a message to an AI — and every time the model responds — both sides of that exchange are counted and billed.

For the past three years, that didn't matter much because most platforms charged a flat monthly subscription. An advisor could run a 60-page financial plan through ChatGPT a hundred times and pay the same $20 a month as a colleague who used it twice. That model is ending. Anthropic, OpenAI and others have been shifting enterprise customers to usage-based billing — charging for every prompt, every document analysis, every automated agent task. The token count is no longer a technical footnote. It's a budget line.

What your common tasks actually cost

Here are rough token counts for work an advisory practice would recognize. The ranges reflect variation in document length, prompt complexity and output detail.

Task

Approximate tokens

Short client email (in + out)

300 – 500

Summarizing a fund fact sheet

800 – 1,500

Drafting a client review letter

1,000 – 2,000

Analyzing a 10-page prospectus section

8,000 – 15,000

Full simplified prospectus (input only)

40,000 – 100,000

Comprehensive financial plan: portfolio + IPS + meeting notes

150,000 – 350,000

AI agent running autonomously for one hour

500,000 – 1,000,000+


Indicative estimates. Actual consumption varies by model, prompt structure and output complexity.

The agent row is where firms got caught off guard. When AI agents — software that executes a task sequence autonomously without a human prompting each step — started running in the background, the token meter ran with them. Uber burned through its entire 2026 AI budget by April. Software firm Workato saw its bill jump sevenfold the day Anthropic switched it to usage-based pricing. Neither was doing anything unusual — just agents running at volume, around the clock.

For an RIA deploying automated onboarding workflows, client reporting pipelines or CRM update agents, the same dynamic applies. Agentic AI tools are proliferating fast across the advisory space — Salesforce, Zocks, Wealthbox and others all rolled out autonomous workflow capabilities in 2026 — and the token cost of an agent running for an hour is the same order of magnitude as a senior associate's billable time. That's worth knowing before the workflow goes live.

What this costs in dollars

Token prices vary by model tier. Based on current API pricing:

  • Budget models (older or lighter models): around $0.50 – $2 per million tokens
  • Mid-range models (GPT-4 class, Claude Sonnet): around $3 – $10 per million tokens
  • Frontier models (Claude Opus, GPT-5.5): around $15 – $30 per million tokens

Running a comprehensive financial plan through a frontier model — say 250,000 tokens round-trip — costs roughly $3.75 to $7.50 per client file at current API rates. Run that for 50 clients a month and you're spending $185 to $375 on that task alone.

Under a flat $20 subscription, the same work was free. That's the shift.

For practices on subscription plans rather than direct API access, usage-based billing shows up indirectly — as throttled access during peak hours, hard usage caps that trigger mid-month, or prompts to upgrade to a higher tier. The platforms are no longer absorbing heavy document usage on your behalf. They never were, really — they were just subsidizing early adoption.

Context windows: the number that matters more than speed

The context window is how much text a model can hold in its working memory at once, measured in tokens. For advisory work, it's arguably a more important spec than raw model speed or benchmark scores.

If you're asking a model to analyze a client's existing portfolio, cross-reference their IPS, compare two fund options and flag any suitability issues — all simultaneously — everything needs to fit inside the context window at once. Material that doesn't fit gets dropped. The model works from an incomplete picture, and it won't necessarily tell you that's what's happening.

Both Claude Opus 4.8 and GPT-5.5 now offer one-million-token context windows via their APIs — enough for most complex single-client files. A simplified prospectus runs 40,000 to 100,000 tokens; a full client file including portfolio statements, IPS, meeting notes and correspondence history might run 200,000 to 350,000. For nearly all single-client work, a million tokens is sufficient. For firm-wide data analysis or multi-client batch processing, it's worth stress-testing the limits before building workflows around them.

The tradeoff: larger contexts cost more per query. Pulling 300,000 tokens into a frontier model is meaningfully more expensive than pulling 10,000. For routine tasks — a quick summary, a standard follow-up email, a research question — a smaller, cheaper model does the job and costs a fraction of the price. Knowing which tool to reach for is becoming a practice management skill, not just a tech preference. Some of the larger RIAs are already building in-house AI infrastructure precisely to gain that kind of routing control — deciding at the firm level which tasks go to which model tier, rather than leaving it to individual advisors.

Building a usage policy that actually works

The firms managing AI costs well in 2026 are not the ones with the most restrictive policies. They're the ones with the clearest ones.

Match the model to the task. Frontier models are for complex multi-document analysis, compliance-sensitive drafting and autonomous workflows where accuracy is non-negotiable. They're not for first-pass research, simple summaries or quick question-answering. A smaller mid-range model handles those tasks well and costs 70 to 90 percent less per token.

Set cost awareness before you set caps. Hard limits frustrate staff and get routed around. Helping advisors and operations staff understand roughly what different tasks cost — so they make informed choices rather than just hitting a wall — produces better behavior and fewer workarounds.

Monitor agent usage from day one. An autonomous agent running a multi-step onboarding workflow for several hours will consume more tokens than a human advisor would in a week. Build monitoring and spending alerts into agentic deployments before they go live, not as an afterthought when the invoice arrives.

Keep client data out of public interfaces as a non-negotiable. The token cost is manageable. An SEC inquiry into how your firm handles client data in AI systems is not — and regulators have made clear that existing recordkeeping and supervision obligations apply to every AI tool in your stack.

The bottom line

Tokens are the unit of work for AI — the same relationship as kilowatt-hours to electricity. Most advisory firms spent the first few years of AI adoption with the meter covered up. The subsidy period is over. For RIAs running document-intensive workflows with client financial data, understanding what your common tasks consume — and building a model-routing policy that matches cost to task — is now a practice management decision, not an IT one.

For a full breakdown of which AI platforms suit RIAs and what each costs, see our companion piece: Which AI platform should your RIA actually be using?

Latest News

RIA moves: Sanctuary Wealth supports launch of LGBTQ+-focused Pierstone Wealth Management
RIA moves: Sanctuary Wealth supports launch of LGBTQ+-focused Pierstone Wealth Management

Plus, New York-based Maridea snaps up a women-led practice to debut in the Sunshine State, while Halbert Hargrove in California hails an AUM milestone alongside an aging-care fintech partnership.

Which AI platform should RIAs actually be using? A no-hype guide for 2026
Which AI platform should RIAs actually be using? A no-hype guide for 2026

Which tool fits which job for a fiduciary advisor managing real client relationships.

AI's uncomfortable question for advisors: Who still needs you?
AI's uncomfortable question for advisors: Who still needs you?

McKinsey executive argues AI could deliver sophisticated guidance to mass affluent investors, while Allianz leader warns of “AI socialism” from over-use of the same LLMs.

Coca-Cola's $20B tax fight reaches federal appeals court this week
Coca-Cola's $20B tax fight reaches federal appeals court this week

Decade-long IRS dispute over foreign profit allocation could reshape multinational tax enforcement.

SPONSORED Who builds the income when the pension disappears?

Dan Biagini of American Equity says the steady decline of pensions, longer lifespans and a reset in interest rates are rewriting how advisors build retirement income

SPONSORED Why direct indexing stopped being optional

Direct indexing is on pace to outgrow ETFs and mutual funds. Northern Trust's Ken Lassner explains why the advisors who get it wish they had started sooner.