Jump to Content
Data Analytics

Built with BigQuery: Zeotap uses Google BigQuery to build highly customized audiences at scale

November 30, 2022
Dr. Ali Arsanjani

Director, AI/ML Partner Engineering, Head of AI Center of Excellence, Google Cloud

Sathish K S

CTO, Zeotap

Try Google Cloud

Start building on Google Cloud with $300 in free credits and 20+ always free products.

Free trial

Zeotap’s mission is to help brands monetise customer data in a privacy-first Europe. Today, Zeotap owns three data solutions. Zeotap CDP is the next-generation Customer Data Platform that empowers brands to collect, unify, segment and activate customer data. Zeotap CDP puts privacy and security first while empowering marketers to unlock and derive business value in their customer data with a powerful and marketer-friendly user interface. Zeotap Data delivers quality targeting at scale by enabling the activation of 2,500 tried-and-tested Champion Segments across 100+ programmatic advertising and social platforms. ID+ is a universal marketing ID initiative that paves the way for addressability in the cookieless future. Zeotap’s CDP is a SaaS application that is hosted on Google Cloud. A client can use Zeotap CDP SaaS product suite to onboard its first-party data, use the provided tools to create audiences and activate them on marketing channels and advertising platforms. 

Zeotap partnered with Google Cloud to provide a customer data platform that is differentiated in the market with a focus on privacy, security and compliance. Zeotap CDP, built with BigQuery, is empowered with tools and capabilities to democratize AI/ML models to predict customer behavior and personalize the customer experience to enable the next generation digital marketing experts to drive higher conversion rates, return on advertising spend and reduce customer acquisition cost.

The capability to create actionable audiences that are highly customized the first time, improve speed to market to capture demand and drive customer loyalty are differentiating factors. However, as the audiences get more specific it becomes more difficult to estimate and tune the size of the audience segment. Being able to identify the right customer attributes is critical for building audiences at scale. 

Consider the following example, a fast fashion retailer has a broken size run and is at risk of taking a large markdown because of an excess of XXS and XS sizes. What if you are able to instantly build an audience of customers who have a high propensity for this brand or style, tend to purchase at full price, and match the size profile for the remaining inventory to drive full price sales and avoid costly markdowns. 

Most CDPs provide size information only after a segment is created and its data processed. If the segment sizes are not relevant and quantifiable, the target audiences list has to be recreated impacting speed to market and capturing customer demand. Estimating the segment size and tuning the size of the audience segment is often referred to as the segment size estimation problem. The segment size needs to be estimated and segments should be available for exploration and processing with a sub-second latency to provide a near real-time user experience.

Traditional approaches to solve this problem relies on pre-aggregation database models which involve sophisticated data ingestion and failure management, thus wasting a lot of compute hours and requiring extensive pipeline orchestration. There are a number of disadvantages with this traditional approach:

  1. Higher cost and maintenance as multiple Extract, Transform and Load (ETL) processes are involved

  2. Higher failure rate and re-processing required from scratch in case of failures

  3. Takes hours/days to ingest data at large-scale

Zeotap CDP relies on the power of Google Cloud Platform to tackle this segment size estimation problem using BigQuery for processing and estimation, the BI Engine to provide sub-second latency required for online predictions and Vertex AI ecosystem with BigQuery ML to provide a no-code AI segmentation and lookalike audiences. Zeotap CDP’s strength is to offer this estimation at the beginning of segment creation before any kind of data processing using pre-calculated metrics. Any correction in segment parameters can be made near real time, saving a lot of user’s time.

The data cloud, with BigQuery at its core, functions as a data lake at scale and the analytical compute engine that calculates the pre-aggregated metrics. The BI engine is used as a caching and acceleration layer to make these metrics available with near sub-second latency. Compared to the traditional approach this setup does not require a heavy data processing framework like Spark/Hadoop or sophisticated pipeline management. Microservices deployed on the GKE platform are used for orchestration using BigQuery SQL ETL capabilities. This does not require a separate data ingestion in the caching layer as the BI engine works seamlessly in tandem with BigQuery and is enabled using a single setting.

The below diagram depicts how Zeotap manages the first party data and solves for the segment size estimation problem.

https://storage.googleapis.com/gweb-cloudblog-publish/images/1.MainArchDiagram.max-1900x1900.jpg

The API layer, powered by Apigee provides secure client access to Zeotap’s API infrastructure to read and ingest first party data in real-time. The UI Services Layer, backed by GKE and Firebase provides access to Zeotap’s platform front-ending audience segmentation, real-time workflow orchestration / management, analytics & dashboards. The Stream & Batch processing manages the core data ingestion using PubSub, Dataflow and Cloud Run. Google BigQuery, Cloud SQL, BigTable and Cloud Storage make up all of the Storage layer. 

The Destination Platform allows clients to activate its data across various marketing channels, data management and ad management platforms like Google DDP, TapTap, TheTradeDesk etc (plus more than 150+ such integrations). Google BigQuery is at the heart of the Audience Platform to allow clients to slice and dice its first party assets, enhance it with Zeotap’s universal ID graph or its third-party data assets and push to downstream destinations for activation and funnel analysis. The Predictive Analytics layer allows clients to create and activate machine-learned (e.g. CLV and RFM modeling) based segments with just a few clicks. Cloud IAM, Cloud Operations suite and Collaborations tools deliver the cross-sectional needs of security, logging and collaboration. 

For segment/audience size estimation, the core data that is client’s first party data resides in its own GCP project. First step here is to identify low cardinality columns using BigQuery’s “approx count distinct” capabilities. At this time, Zeotap supports a sub-second estimation on only low cardinality ( represents the number of unique values) dimensions, like Gender with Male/Female/M/N values and Age with limited age buckets. A sample query looks like this,

https://storage.googleapis.com/gweb-cloudblog-publish/images/2.QueryPivot.max-600x600.jpg

Once pivoted by columns, the results look like this

https://storage.googleapis.com/gweb-cloudblog-publish/images/3.QueryPivot_Results.max-900x900.jpg

Now the cardinality numbers are available for all columns, they are divided into two groups, one below the threshold (low cardinality) and one above the threshold (high cardinality). Next step is to run a reverse ETL query to create aggregates on low cardinality dimensions and corresponding HLL sketches for user count (measure) dimensions.

A sample query looks like this

https://storage.googleapis.com/gweb-cloudblog-publish/images/4.QueryCardnality.max-700x700.jpg
https://storage.googleapis.com/gweb-cloudblog-publish/images/5.GCP_Estimator_Project.max-1500x1500.jpg

The resultant data is loaded into a separate estimator Google Cloud project for further processing and analysis. This project contains a metadata store with datasets required for processing client requests and is front ended with BI engine to provide acceleration to estimation queries. With this process, the segment size is calculated using pre-aggregated metrics without processing the entire first party dataset and enables the end user to create and experiment with a number of segments without incurring any delays as in the traditional approach.

This approach obsoletes ETL steps required to realize this use-case which drives a benefit of over 90% time reduction and 66% cost reduction for the segment size estimation. Also, enabling BI engine on top of BigQuery boosts query speeds by more than 60%, optimizes resource utilization and improves query response as compared to native BigQuery queries. The ability to experiment with audience segmentation is one of the many capabilities that Zeotap CDP provides their customers. The cookieless future will drive experimentation with concepts like topics for IBA (Interest-based advertising) and developing models that support a wide range of possibilities in predicting customer behavior.

There is an ever increasing demand for shared data, where customers are requesting access to the finished data in the form of datasets to share both within and across the organization through external channels. These datasets unlock more opportunities where the curated data can be used as-is or coalesced with other datasets to create business centric insights or fuel innovation by enabling ecosystem or develop visualizations. To meet this need, Zeotap is leveraging Google Cloud Analytics Hub to create a rich data ecosystem of analytics-ready datasets. 

Analytics Hub is powered by Google BigQuery, which provides a self-service approach to securely share data by publishing and subscribing to trusted data sets as listings in Private and Public Exchanges. It allows Zeotap to share the data in place having full control while end customers have access to fresh data without the need to move data at large scale. 

Click here to learn more about Zeotap’s CDP capabilities or to request a demo.

The Built with BigQuery advantage for ISVs 

Google is helping tech companies like Zeotap build innovative applications on Google’s data cloud with simplified access to technology, helpful and dedicated engineering support, and joint go-to-market programs through the Built with BigQuery initiative, launched in April as part of the Google Data Cloud Summit. Participating companies can: 

  • Get started fast with a Google-funded, pre-configured sandbox. 

  • Accelerate product design and architecture through access to designated experts from the ISV Center of Excellence who can provide insight into key use cases, architectural patterns, and best practices. 

  • Amplify success with joint marketing programs to drive awareness, generate demand, and increase adoption.

BigQuery gives ISVs the advantage of a powerful, highly scalable data warehouse that’s integrated with Google Cloud’s open, secure, sustainable platform. And with a huge partner ecosystem and support for multi-cloud, open source tools and APIs, Google provides technology companies the portability and extensibility they need to avoid data lock-in. 

Click here to learn more about Built with BigQuery.


We thank the Google Cloud and Zeotap team members who co-authored the blog:
Zeotap: Shubham Patil, Engineering Manager; Google: Bala Desikan, Principal Architect and Sujit Khasnis, Cloud Partner Engineering

Posted in