Indonesia Singapore ไทย Pilipinas Việt Nam Malaysia မြန်မာ ລາວ
← Back to Blog

AI Token Costs Are a Data Architecture Problem

Treat AI token budgets as a data architecture constraint, not a finance problem — leaner pipelines cut costs without cutting capability.

Editorial illustration of a figure managing oversized AI infrastructure pipes connected to a wallet
Illustrated by Mikael Venne

Runaway AI inference costs are a data architecture failure. Here's how to build leaner, smarter pipelines that make every token count.

Somewhere between the AI pilot deck and the CFO’s spreadsheet, a reckoning is arriving. The question isn’t whether your brand should be running AI-powered workflows — it’s whether you can afford to run them badly.

Stephanie Kirmer’s analysis in Towards Data Science puts it plainly: AI token budgets cannot scale infinitely, no matter how confidently hyperscalers price their roadmaps. For marketing and data teams in Southeast Asia building on top of LLM APIs, this isn’t a future problem. It’s a present one — and the root cause is almost always upstream of the model itself.

The Real Cost Driver Isn’t the Model — It’s the Data You Feed It

Token costs are a symptom. The underlying condition is bloated, unstructured, or redundant data being passed into inference pipelines without discipline. If your customer data platform is pushing raw, unfiltered event streams into an AI layer — every session ping, every null field, every duplicated identity record — you’re paying for noise at LLM prices.

This is where first-party data architecture earns its keep. A well-governed data programme doesn’t just serve consent compliance; it creates a leaner signal layer. Brands that have invested in proper identity resolution and consent-scoped segmentation are finding that their AI inputs are cleaner, smaller, and more predictive — which translates directly into lower inference costs and better outputs. The discipline required to build trustworthy data turns out to be the same discipline that makes AI economically viable.

For teams running on Shopee or Lazada merchant data, LINE CRM exports, or Grab audience signals, the temptation is to pipe everything into the model and let it sort out what matters. That approach will bankrupt your AI budget before it delivers meaningful lift.

Migration Moments Are Architecture Moments — Don’t Waste Them

The dbt team’s migration guide makes an argument that applies far beyond SQL transformations: when you move to a new tool, rebuilding your legacy patterns inside it is the worst possible outcome. You inherit the technical debt without the familiarity.

The same logic holds when teams migrate to AI-augmented pipelines. The organisations making the most expensive mistakes are those lifting their existing data flows — built for batch reporting, not real-time inference — directly into LLM workflows. The latency is wrong, the granularity is wrong, the cost model is completely wrong.

A migration is an editorial moment. It forces you to ask which data actually drives decisions, and which data you’ve been carrying out of habit. For Southeast Asian brands operating across multiple markets with fragmented data stacks — different CDPs in Thailand, Indonesia, and the Philippines, different consent frameworks in each — this is the pressure that finally justifies consolidation. The cost of AI makes the cost of data sprawl visible in a way that quarterly reporting never did.


Observability Isn’t Optional When Agents Are Spending Your Budget

Monte Carlo’s launch of native observability for Databricks Agent Bricks points to something the industry is quietly grappling with: AI agents are making consequential decisions autonomously, and most teams have limited visibility into what those agents are actually doing between inputs and outputs.

For data teams, this matters because an unmonitored agent workflow is an uncapped cost centre. If an agent is calling an enrichment API on every inference loop when it only needs to do so once per session, that’s a billing problem masquerading as a design decision. Monte Carlo’s approach — reading MLflow trace data directly from Unity Catalog Delta tables with zero additional instrumentation — is a useful model for what good observability looks like: it should add clarity without adding complexity.

The practical implication for teams building AI activation workflows: instrument your agents before you scale them, not after. Define cost thresholds per workflow the same way you’d define budget caps on paid media campaigns. Treat token spend as a media metric — impressions have a cost, and so does every inference call.

Dynamic Activation Raises the Stakes for Clean Data

Big Happy’s dynamic creative optimisation for 3D DOOH — automatically adapting campaigns based on weather, location, and live environmental signals — is a useful illustration of where activation is heading. Context-aware, signal-driven, real-time. The creative layer is becoming reactive infrastructure.

But reactive infrastructure is only as good as the signals feeding it. In Southeast Asia, where outdoor media audiences are dense and diverse, the value of DOOH personalisation lives or dies on the quality of the contextual data layer underneath. A campaign that serves the right creative at the wrong moment — because the weather API data was stale or the location signal was miscategorised — doesn’t just underperform. It trains the optimisation model on bad feedback.

This is the loop that clean first-party data closes. When your owned signals are well-structured and reliable, they become the calibration layer for every downstream activation — whether that’s an LLM-powered CRM workflow or a dynamic outdoor placement. The brands winning on activation in 2026 aren’t the ones with the most signals. They’re the ones whose signals they can trust.

Key Takeaways

  • Treat AI token budgets as a data quality metric — bloated, unfiltered inputs are the primary driver of runaway inference costs.
  • Use migration projects as architectural audits: identify which data actually drives decisions and leave legacy noise behind.
  • Instrument AI agent workflows with cost thresholds before scaling, applying the same budget discipline you’d apply to any paid channel.

The financial pressure AI is now placing on data infrastructure may be the forcing function that finally drives the data quality investment many organisations have been deferring. When every token costs money, clean data stops being a compliance aspiration and becomes a P&L line. The more interesting question: which brands in Southeast Asia will use that pressure to build something genuinely durable — and which will just find cheaper models to pipe their mess into?


At grzzly, we help brands across Southeast Asia build first-party data programmes that are lean by design — structured to feed AI workflows, activation platforms, and dynamic creative without generating the kind of costly signal bloat that’s quietly inflating AI budgets across the region. If your data architecture was built for reporting and you’re now asking it to power real-time AI, it’s worth a conversation. Let’s talk

Lavender Grizzly

Written by

Lavender Grizzly

Turning privacy constraints into competitive advantage. Builds first-party data programmes that are compliant by design, valuable by intent, and trusted by the people whose data they hold.

Enjoyed this?
Let's talk.

Start a conversation