The holy grail of modern data teams is "Self-Serve Analytics." We want our Product Managers and Sales leaders to ask questions like "Which enterprise accounts are at risk of churning?" and get an immediate, accurate answer without waiting for a Data Engineer to write SQL.
For years, we tried to solve this with better dashboards. It didn't work. The questions were always slightly different than what the dashboard showed.
Now, with LLMs like Google Gemini, we have a new way: AI Agents.
But there's a catch. If you connect an LLM directly to your raw BigQuery data, it will hallucinate. It doesn't know that your "Active User" definition requires 3 events, or that "Churn" is calculated on a 30-day rolling window.
In this guide, I’ll show you how to build a reliable, hallucination-proof analytics agent using:
- BigQuery (Data Warehouse)
- GrowthCues Core (Open Source Semantic Layer)
- Gemini CLI (The reasoning engine)
The Missing Link: A Semantic Layer for AI
The secret to making AI work with data isn't a better model; it's better context.
We need to give Gemini a "Dictionary" that explains exactly what our data means. That's why I open-sourced GrowthCues Core, a dbt project that not only calculates standard B2B SaaS metrics but also generates an AI-Ready Schema.
Step 1: Deploy the Semantic Layer
First, we need clean data. Clone the GrowthCues Core repository and deploy it to your BigQuery project.
git clone https://github.com/ArvoanDev/growthcues-core.git
cd growthcues-core
pip install dbt-bigquery
dbt deps
dbt run
This transforms your messy Segment/Rudderstack event logs into clean, structured tables:
- fct_product_metrics_daily: Global DAU/MAU trends and stickiness ratios.
- fct_account_metrics_daily: Account-level signals like seat expansion velocity (net_new_users_7d) and usage drops (volume_change_ratio_7d).
- fct_user_metrics_daily: Snapshots of individual user behavior, including champion rank (usage_rank_in_account) and lifecycle status (New vs. Resurrected).
Step 2: The "Context Injection"
This is where the magic happens. The dbt project is configured to use persist*docs, which pushes rich descriptions directly into your BigQuery columns.
In the repo, there is a file called models/marts/core/schema.yml. Unlike standard documentation, this file is Prompt Engineered. It contains specific tags like [Definition], [Formula], and [Context] that are designed to ground an LLM.
For example, volume_change_ratio_7d isn't just a number; the schema tells Gemini: *"[Context] < 0.5 = High Churn Risk (Usage halved). > 1.5 = Acceleration."_
Step 3: Querying with Gemini CLI
With the Gemini CLI installed (or using Google AI Studio), we can now simulate an "Agentic" workflow. We will provide the schema as the System Context and then ask our business question.
The Prompt:
SYSTEM:
You are an expert Data Analyst. You have access to a BigQuery dataset with B2B SaaS metrics.
Here is the schema definition explaining the tables and columns:
[PASTE CONTENT OF schema.yml HERE]
User Question:
"Which accounts are showing signs of volumetric churn risk, and who are the champions we should contact to save them?"
The Gemini Response:
Because Gemini now understands the schema, it won't guess. It knows to look for volume_change_ratio_7d < 0.5 for churn risk and usage_rank_in_account = 1 for champions. It generates precise SQL:
WITH risky_accounts AS (
SELECT
account_id,
volume_change_ratio_7d,
metric_date
FROM \`your_project.growthcues_core.fct_account_metrics_daily\`
WHERE metric_date \= DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
AND volume_change_ratio_7d \< 0.5
AND volume_change_ratio_7d \> 0 \-- Filter out inactive
)
SELECT
r.account_id,
r.volume_change_ratio_7d as usage_drop_severity,
u.user_id as champion_id,
u.n_events_monthly as champion_volume
FROM risky_accounts r
LEFT JOIN \`your_project.growthcues_core.fct_user_metrics_daily\` u
ON r.account_id \= u.latest_account_id
AND u.metric_date \= r.metric_date
WHERE u.usage_rank_in_account \= 1 \-- The top user
ORDER BY r.volume_change_ratio_7d ASC;
Why this changes everything
This workflow effectively solves the "Cold Start" problem for AI analytics.
- No Hallucinations: The AI relies on the [Definition] tags in your dbt schema, not its training data.
- Complex Logic: It understands that "Champions" are defined by usage_rank_in_account and "Churn Risk" by specific volumetric drops.
- Self-Serve: You can wrap this in a simple Streamlit app or Slack bot, allowing your GTM team to query the warehouse directly.
Going Further: Predictive Intelligence
This setup gives you a powerful Descriptive engine (answering "What happened?").
If you want to move to Predictive intelligence (answering "Who will churn next?") or model and track customer journeys, that's where the commercial GrowthCues platform comes in. It takes this same semantic layer and runs AI to score every account's likelihood to convert or churn.
But for day-to-day analytics? This open-source stack is all you need to get started.
Star the GrowthCues Core repo on GitHub and start building your own Agent today.