The IG model stack: tier-aware AI in production.
What the IG bench actually runs inside a Fractional CMO + AI Solutions client engagement, from model selection through observability. The architecture we deploy is the same one running today at OneBenefits.ai, AllVoiceAI.com, and inside Next Best Action.
Most agencies that talk about AI execution stop at "we use ChatGPT for content." IG’s AI execution layer is a production architecture with named components, named cost lines, named latency budgets, and named accountability. Here is the stack we deploy inside an active Fractional CMO engagement.
The default IG model stack (2026)
Every IG engagement that includes the AI execution layer ships with the same baseline stack:
Primary reasoning model
Claude Fable 5 — agentic workflows, customer intelligence, campaign operations
High-volume workhorse
Claude Haiku 4.5 — classification, extraction, real-time scoring, summary generation
Sensitive-domain fallback
Claude Opus 4.8 — automatic via Fable 5 routing, explicit for regulated content
Embeddings
Anthropic embedding models for retrieval against client knowledge bases
Orchestration
Custom orchestration layer with workflow state, tool integration, observability
Observability
Structured logging, sampled traces, latency and cost dashboards per workflow
Each component exists for a specific reason and produces a measurable contribution to the engagement’s outcomes.
How we route calls across the stack
Inside a typical IG engagement, the model routing logic answers four questions for each AI call.
One. Is the call agentic or single-prompt? Agentic calls default to Fable 5 because the long-horizon capability is the leading reason to be on this model class. Single-prompt calls continue to question two.
Two. Is the call high-volume? Anything running more than 100 times per day on the same pattern routes to Haiku 4.5 unless quality testing shows Haiku 4.5 fails. The cost-per-call difference at volume is too large to ignore.
Three. Does the call touch sensitive domains? Healthcare, finance, defense, regulated marketing content. These route to Opus 4.8 directly rather than relying on Fable 5’s automatic fallback, because predictable behavior matters more than capability lift in these workflows.
Four. What is the latency budget? Real-time interaction (under 500ms) usually rules out Fable 5 in favor of Haiku 4.5 or a cached response. Non-real-time workflows can use Fable 5 freely.
The five workflows we deploy by default
Every engagement gets the same five baseline workflows wired into the stack. Engagement-specific workflows get added on top.
1. Campaign brief to launch automation
An agentic Fable 5 workflow that reads a campaign brief and produces the full launch package: audience segments, creative variants per segment, channel sequencing, lifecycle journeys, measurement plan. Human checkpoint before launch. Time-to-launch typically reduces by 60-75% versus the manual process. The OneBenefits Customer Zero deployment runs this in production.
2. Customer intelligence synthesis
Fable 5 against the client’s full customer data: CRM, support transcripts, sales calls, behavioral signals. Produces ICP refinements, persona updates, and segment-by-segment insight every two weeks. The 1M context window means we can hold the whole picture without retrieval gymnastics.
3. Real-time personalization scoring
Haiku 4.5 ranks the next best action for each customer based on streaming signals. Fable 5 handles the offline strategy update that retrains the scoring logic monthly. Behind this workflow is the same architecture that powers Next Best Action.
4. Content engine
End-to-end content production from topic identification through publish. Fable 5 handles strategic decisions (which topics to write on, how each fits the content pillar). Haiku 4.5 handles the high-volume tasks (SEO optimization, schema markup, internal link suggestions). The content engine that just went live at All Voice AI uses this pattern with a four-post-per-week cadence.
5. Performance read and reallocation
Fable 5 reasons across the client’s active campaigns and channels, identifies which are pulling, and proposes reallocation. The work that used to take a senior analyst two days produces a Monday-morning brief that the Fractional CMO reviews and acts on inside an hour.
What this costs (real numbers)
Most clients want to know the budget impact. The default IG AI execution layer typically runs $400 to $1,800 per month in model costs depending on workflow volume and client scale. That cost is included inside the Fractional CMO engagement on the Embedded ($7,500/mo) and Operating Partner ($15,000+/mo) tiers, and quoted separately on the Advisory ($2,500/mo) tier.
This is meaningfully less than a single-vendor marketing platform license (typically $4,000-$15,000 per month) and produces a broader workflow surface. The cost math improved further on June 9, 2026 when Fable 5 GA pricing came in at less than half of Mythos Preview rates.
What stays out of the stack
Three things we deliberately do not include in the default stack.
No proprietary "secret sauce" model. We use the public model APIs from Anthropic. The differentiation is the architecture, the routing, and the operator who composes it. Anything else is marketing language hiding either underperformance or vendor lock-in risk.
No always-on autonomy. Every workflow has explicit checkpoints at irreversible-action boundaries. The Fractional CMO or the client’s designated approver is in the loop on anything that touches external communication, regulated content, or significant budget allocation.
No black-box pricing. We share the model costs as a line item, not as a marked-up integrated fee. Clients see what their workflows actually cost and can make informed scaling decisions.
Frequently asked questions
Can we use the IG model stack without engaging IG?
The architectural patterns in this article are publicly documented. The compounding value is in the operator who composes the stack against your specific workflows. Teams running this independently can absolutely succeed. Many of them eventually engage IG anyway because operator time is the bottleneck, not architecture knowledge.
Why Anthropic models specifically?
Anthropic's public model lineup currently produces the best cost-to-capability tradeoff for the workflows we deploy most often. We continue to evaluate other vendors and switch when the math changes. The architectural patterns are vendor-agnostic.
How long does the model stack take to deploy?
For Embedded tier engagements, the baseline stack is operational within 14 days of kickoff. The first production agentic workflow ships in 30-45 days. Engagement-specific workflows ship every 2-4 weeks after that.
What if our team already has an AI stack?
We architect around it. The IG model stack is the default. The actual deliverable is whatever architecture produces the best outcomes for your specific engagement. Many engagements involve refactoring an existing stack rather than greenfield work.
How does this work for regulated industries?
For regulated industries, we increase the explicit Opus 4.8 routing and the human-in-the-loop checkpoint density. We have shipped the architecture inside healthcare and benefits administration use cases. The OneBenefits Customer Zero deployment is the production reference.