← Back to blog
analytics·May 13, 2026·7 min read

RFM analysis in BigQuery with dbt — the practical guide

RFM segmentation tells you who your best customers are, who's about to churn, and who's worth winning back. Here's how to build it properly in BigQuery.

analytics

RFM analysis in BigQuery with dbt — the practical guide

RFM analysis answers three questions about every customer:

  • Recency — how recently did they buy?
  • Frequency — how often do they buy?
  • Monetary — how much do they spend?

Score each on a scale of 1-5. Combine the scores. Your 555s are your best customers. Your 111s are gone. The 511s bought recently but only once — they're your conversion opportunity. The 155s used to be great but haven't bought in months — they're your churn risk.

It's one of the oldest segmentation frameworks in analytics. It works because it's grounded in behavior, not demographics or personas. What someone does matters more than who they are.

The model

Three dbt models. That's it.

1. Base metrics per customer

-- models/intermediate/int_rfm__customer_metrics.sql
SELECT
    customer_id,
    MAX(order_date)                     AS last_order_date,
    COUNT(DISTINCT order_id)            AS total_orders,
    SUM(order_total)                    AS total_revenue,
    DATE_DIFF(CURRENT_DATE(), MAX(order_date), DAY) AS days_since_last_order,
    DATE_DIFF(CURRENT_DATE(), MIN(order_date), DAY) AS customer_tenure_days,
    SAFE_DIVIDE(SUM(order_total), COUNT(DISTINCT order_id)) AS avg_order_value
FROM {{ ref('mart_orders') }}
WHERE order_status = 'completed'
GROUP BY 1

2. Score with percentile-based quintiles

-- models/intermediate/int_rfm__scored.sql
SELECT
    customer_id,
    days_since_last_order,
    total_orders,
    total_revenue,
    avg_order_value,
    -- Recency: lower days = better = higher score
    5 - NTILE(5) OVER (ORDER BY days_since_last_order ASC) + 1 AS recency_score,
    -- Frequency: more orders = higher score
    NTILE(5) OVER (ORDER BY total_orders ASC) AS frequency_score,
    -- Monetary: more revenue = higher score
    NTILE(5) OVER (ORDER BY total_revenue ASC) AS monetary_score
FROM {{ ref('int_rfm__customer_metrics') }}
WHERE total_orders > 0

NTILE(5) splits customers into 5 equal-sized buckets. No arbitrary thresholds — the boundaries adapt to your data distribution. If your top quintile has 50+ orders and the bottom has 1-2, the model reflects that automatically.

3. Segment and label

-- models/marts/mart_rfm_segments.sql
SELECT
    customer_id,
    days_since_last_order,
    total_orders,
    total_revenue,
    avg_order_value,
    recency_score,
    frequency_score,
    monetary_score,
    CONCAT(
        CAST(recency_score AS STRING),
        CAST(frequency_score AS STRING),
        CAST(monetary_score AS STRING)
    ) AS rfm_score,
    CASE
        -- Champions: recent, frequent, high-value
        WHEN recency_score >= 4 AND frequency_score >= 4 AND monetary_score >= 4
        THEN 'Champions'
 
        -- Loyal: frequent and high-value, recency varies
        WHEN frequency_score >= 4 AND monetary_score >= 4
        THEN 'Loyal Customers'
 
        -- Potential Loyalists: recent, moderate frequency
        WHEN recency_score >= 4 AND frequency_score >= 2 AND frequency_score <= 3
        THEN 'Potential Loyalists'
 
        -- New Customers: very recent, low frequency
        WHEN recency_score >= 4 AND frequency_score = 1
        THEN 'New Customers'
 
        -- At Risk: used to be good, not recent
        WHEN recency_score <= 2 AND frequency_score >= 3 AND monetary_score >= 3
        THEN 'At Risk'
 
        -- Can't Lose Them: were champions, going dormant
        WHEN recency_score <= 2 AND frequency_score >= 4 AND monetary_score >= 4
        THEN 'Cant Lose Them'
 
        -- Hibernating: low everything
        WHEN recency_score <= 2 AND frequency_score <= 2
        THEN 'Hibernating'
 
        -- Need Attention: mid-range across the board
        WHEN recency_score = 3 AND frequency_score = 3
        THEN 'Need Attention'
 
        -- About to Sleep: below average recency, low-mid frequency
        WHEN recency_score <= 3 AND frequency_score <= 2
        THEN 'About to Sleep'
 
        ELSE 'Other'
    END AS segment_name
FROM {{ ref('int_rfm__scored') }}

What each segment tells you

SegmentRFM PatternSize (typical)Action
Champions5-5-5, 5-4-5, etc.5-10%Reward, ask for referrals, early access
Loyal Customers*-4-4, *-5-410-15%Upsell, loyalty programs
Potential Loyalists4-2-, 5-3-10-15%Nurture to increase frequency
New Customers5-1-*5-10%Onboard well, don't overwhelm
At Risk2-4-4, 1-3-310-15%Win-back campaign, personal outreach
Can't Lose Them1-5-5, 2-4-53-5%Urgent — these are high-value going dormant
Hibernating1-1-1, 2-1-115-25%Low-cost re-engagement or accept the loss
About to Sleep3-2-, 2-2-10-15%Gentle re-engagement before they go dormant

The "Can't Lose Them" segment is the one that pays for the analysis. These are customers who used to be your best — high frequency, high spend — but haven't purchased recently. Every day without action increases the chance they're gone for good.

Visualization

RFM data is naturally visual. Three approaches:

Segment distribution (bar chart)

-- How many customers in each segment?
SELECT
    segment_name,
    COUNT(*) AS customers,
    SUM(total_revenue) AS segment_revenue,
    AVG(total_revenue) AS avg_revenue_per_customer
FROM {{ ref('mart_rfm_segments') }}
GROUP BY 1
ORDER BY segment_revenue DESC

RFM heatmap (the classic view)

Plot Recency (x-axis) vs Frequency (y-axis), color by count or revenue. Each cell is one RFM combination. Bright cells = many customers. The top-left (high recency, high frequency) is your sweet spot.

-- Heatmap data: recency × frequency, colored by customer count
SELECT
    recency_score,
    frequency_score,
    COUNT(*) AS customer_count,
    SUM(total_revenue) AS total_revenue,
    AVG(monetary_score) AS avg_monetary_score
FROM {{ ref('int_rfm__scored') }}
GROUP BY 1, 2
ORDER BY 1 DESC, 2 ASC

Movement tracking (the advanced view)

How are customers migrating between segments over time? This requires snapshotting:

-- models/snapshots/rfm_snapshot.sql
{% snapshot rfm_segment_snapshot %}
{{
    config(
      target_schema='snapshots',
      unique_key='customer_id',
      strategy='check',
      check_cols=['segment_name'],
    )
}}
SELECT customer_id, segment_name, CURRENT_DATE() AS snapshot_date
FROM {{ ref('mart_rfm_segments') }}
{% endsnapshot %}

Now you can answer: "How many Champions from Q1 became At Risk in Q2?" That's the real insight — not the static picture, but the flow.

Activation: making RFM operational

Segments sitting in a dashboard are interesting. Segments pushing to your marketing tools are valuable.

Email campaigns (via reverse ETL)

-- models/activation/rfm_email_audiences.sql
SELECT
    c.email,
    r.segment_name,
    r.days_since_last_order,
    r.total_revenue,
    CASE
        WHEN r.segment_name = 'Cant Lose Them'
        THEN 'winback_urgent'
        WHEN r.segment_name = 'At Risk'
        THEN 'winback_standard'
        WHEN r.segment_name = 'Potential Loyalists'
        THEN 'nurture_to_loyal'
        WHEN r.segment_name = 'New Customers'
        THEN 'onboarding_series'
        WHEN r.segment_name = 'Champions'
        THEN 'referral_ask'
    END AS email_campaign
FROM {{ ref('mart_rfm_segments') }} r
JOIN {{ ref('dim_customers') }} c ON r.customer_id = c.customer_id
WHERE email_campaign IS NOT NULL

Push to SendGrid, Klaviyo, or whatever email tool you use via Hightouch or Census. Each segment gets a tailored campaign — not a generic blast.

Ad audiences

Same pattern — push "At Risk" and "Can't Lose Them" segments to Google Ads and Meta as custom audiences. Win-back ads targeted at people who already know your brand are 3-5x more efficient than cold prospecting.

CRM enrichment

Add the RFM segment and score to each contact in Salesforce or HubSpot. When a sales rep opens a contact record, they see: "Champion — 12 orders, $47K lifetime value, last purchase 8 days ago." Context changes the conversation.

Common mistakes

1. Fixed thresholds instead of percentiles

Don't define "high frequency" as ">10 orders." Use NTILE() or percentile-based scoring. Fixed thresholds break when your data distribution changes — after a holiday season, after a product launch, after a year of growth. Percentiles adapt automatically.

2. Not excluding outliers

One customer with 500 orders will compress everyone else into the bottom quintile. Consider capping at the 99th percentile or excluding obvious B2B/wholesale accounts from B2C segmentation.

3. Running it once

RFM is a moving picture, not a snapshot. Run it on a schedule (weekly or monthly). Track how customers move between segments. A customer who was a Champion 3 months ago and is now At Risk is a more urgent signal than a customer who's been Hibernating for a year.

4. Not connecting it to action

The most common RFM failure: a beautiful analysis that lives in a slide deck and never reaches the email tool, the ad platform, or the CRM. Build the activation layer from day one. If the segment doesn't trigger an action, it's trivia.

The stack

LayerToolWhat it does
Data sourceShopify / Stripe / your order systemRaw transaction data
IngestionFivetranSyncs to BigQuery on schedule
WarehouseBigQueryStores all historical transactions
TransformationdbtThe 3 RFM models + activation models
VisualizationSigma or LookerHeatmaps, segment distribution, movement tracking
ActivationHightouch → SendGrid + Google Ads + SalesforcePushes segments to operational tools

Total infrastructure cost: under $300/month for most e-commerce and B2B companies.


We build RFM models and customer segmentation pipelines in BigQuery + dbt — from raw transactions to activated audiences in your email and ad tools. If your customer data is sitting in a warehouse without segmentation, book a discovery call.

Got a similar problem?

30 minutes. We'll tell you honestlywhat's broken.