First-Party Data Stacks: Why the Middle Layer Still Matters

The data cloud consolidation story just got a significant plot point. Databricks’ CustomerLake launch — positioning the lakehouse as a native home for customer identity, governance, and AI — is being read in some quarters as a signal that CDPs are on borrowed time. That reading is wrong, but it contains a useful provocation.

What CustomerLake actually validates is that the destination for governed customer data and AI workloads is settling. What it doesn’t resolve is the harder, messier upstream problem: how you collect that data with consent intact, keep it interoperable across your channel stack, and activate it fast enough to matter.

The Consolidation Is Real — and Incomplete

Tealium’s Sav Khetan frames the CustomerLake announcement clearly: data clouds like Snowflake and Databricks are winning the governed storage and AI layer. That’s not a threat to the operational layer — it’s a clarification of what the operational layer actually needs to do.

The distinction matters enormously for Southeast Asian brands building out their stacks right now. A lakehouse gives you a powerful place to run identity resolution and predictive models. It does not give you a real-time event stream from your Shopee storefront, a consent flag synced from your LINE OA, or an activation path back into Grab’s ad network — all in the same breath. Those require a collection and orchestration layer that sits between your customer touchpoints and your data cloud, not inside either one.

Brands that conflate governed storage with operational activation tend to discover the gap the hard way: clean data in the warehouse, stale segments in the channel.

Here’s where the architectural conversation intersects with something most data teams underweight: the consent provenance of every record flowing into that beautifully governed lakehouse.

Southeast Asia’s regulatory landscape is fragmenting in instructive ways. Thailand’s PDPA, Indonesia’s PDP Law, and the Philippines’ Data Privacy Act all have distinct consent requirements — and regional enforcement is becoming less theoretical. The brands that will win the next three years are not the ones with the most data; they’re the ones whose data carries auditable proof of how and when consent was given.

This is why the independent operational layer Tealium describes isn’t just an integration convenience — it’s the mechanism by which consent state travels with the data from collection through to activation. Strip that layer out, and you’re running governed AI on data whose lineage you can only approximate. That’s a compliance liability dressed up as a cost saving.

The dbt migration guide published by Daniel Poppy this week makes an adjacent point about data infrastructure more broadly: teams that lift and shift legacy patterns into new tooling don’t get new outcomes, they get old problems in new clothing. The same principle applies to consent architecture. Migrating to a lakehouse without rethinking your consent collection layer doesn’t make you compliant — it makes you faster at the wrong thing.

The Token Budget Problem Has a First-Party Analogue

Towards Data Science’s Stephanie Kirmer raises an underappreciated constraint in AI infrastructure: token budgets are finite, and the economics of running large models against large customer datasets are genuinely non-trivial. This is not a future concern — it’s shaping architecture decisions right now.

For first-party data programmes, the implication is this: not all customer data deserves to be fed into expensive AI pipelines. The value of a consented, well-structured first-party dataset isn’t just its breadth — it’s its precision. A brand that has collected explicit interest signals, purchase intent markers, and channel preferences through a properly designed consent programme can run leaner, cheaper AI workloads against a smaller, higher-quality dataset and outperform a competitor running noisy third-party data through a much larger model.

This is the competitive advantage that privacy-led data programmes are quietly building. The constraint — consent — forces the discipline. Brands that couldn’t rely on cookie-based mass collection had to get specific about what they were asking for and why. That specificity is now an asset in a world where AI inference costs real money.

For marketing teams in markets like Vietnam and the Philippines, where mobile-first audiences are highly app-native and relatively generous with first-party signals when the value exchange is clear, this is an opportunity that hasn’t fully been priced in.

Building the Stack That Ages Well

The architecture that holds up over the next several years looks something like this: a data cloud (Databricks, Snowflake) handling governed storage, identity resolution at scale, and AI model execution; an independent operational layer managing consent-first collection, real-time event streaming, and channel activation; and a transformation layer (dbt or equivalent) that enforces clean modelling discipline rather than replicating whatever mess the previous stack accumulated.

What makes this resilient isn’t any single vendor — it’s the separation of concerns. The consent layer stays independent so it can adapt to regulatory changes without requiring a data cloud migration. The activation layer stays real-time so that governed insights can actually reach customers at the moment they matter. And the whole system is built on data whose provenance you can explain to a regulator, a brand safety team, or a customer who asks.

The question worth sitting with: most brands in Southeast Asia have at least two of these three layers in place. Which one is the missing link — and what does its absence cost you each month in compliance exposure, activation latency, or AI spend on low-quality inputs?

Key Takeaways

Data cloud consolidation clarifies where governed AI lives — it does not replace the need for a consent-first collection and activation layer between your touchpoints and your lakehouse.
Consent provenance isn’t a compliance checkbox; it’s the mechanism that makes your first-party data usable, auditable, and defensible across Southeast Asia’s fragmented regulatory landscape.
Leaner, high-quality consented datasets outperform noisy large ones when AI token costs are real — precision in data collection is becoming a direct P&L advantage.

The brands winning the first-party data race in Southeast Asia aren’t necessarily the ones with the largest data clouds or the most sophisticated AI. They’re the ones that built the unglamorous middle layer correctly — consent collection that travels with the data, activation that works in real time, and modelling discipline that doesn’t replicate legacy mistakes in new infrastructure. The technology choices are increasingly obvious. The architectural discipline is still rare.

At grzzly, we help brands across Southeast Asia design first-party data programmes that are compliant by architecture, not by accident — connecting consent infrastructure to activation layers that actually reach your customers on Shopee, LINE, Grab, and beyond. If you’re rethinking your stack or trying to make sense of where CustomerLake, your CDP, and your consent obligations intersect, we’d be glad to map it out together. Let’s talk

First-Party Data Stacks: Why the Middle Layer Still Matters

The Consolidation Is Real — and Incomplete

Consent Isn’t a Feature — It’s the Foundation

The Token Budget Problem Has a First-Party Analogue

Building the Stack That Ages Well

Enjoyed this?Let's talk.

Enjoyed this?
Let's talk.