← Back to blog
engineering·May 13, 2026·6 min read

Website visitor deanonymization — what it is and how it works

Someone from a 500-person company visited your pricing page last week. Without deanonymization, they're a line in GA4. With it, you have their name.

engineering

Website visitor deanonymization — what it is and how it works

97% of your website visitors leave without filling out a form, clicking a CTA, or doing anything that identifies them. They're anonymous sessions in GA4. Ghost rows in PostHog. Traffic you paid for, learned nothing from.

Deanonymization tools change that equation. For B2B companies, they turn anonymous page views into named contacts with titles, companies, and LinkedIn URLs. The question isn't whether this data is useful — it obviously is. The question is what you do with it once you have it.

How it works

Three mechanisms, depending on the tool:

1. IP-to-company matching

The simplest and most common approach. Every web request carries an IP address. Databases like IPinfo, MaxMind, and Clearbit map IP ranges to company names.

  • Accuracy: company-level only. You know Acme Corp visited your site. You don't know who at Acme Corp.
  • Coverage: decent for mid-to-large companies with dedicated IP ranges. Poor for small companies on shared ISPs and anyone working from home on residential internet.
  • Cost: IPinfo Lite is free. Commercial APIs run $100-500/month.

2. Identity graph / co-op pixel networks

Tools like RB2B and Leadfeeder use a network of publisher pixels across the web. When a user visits a site in the network and identifies themselves (by logging in, filling a form), the network associates their browser/device with their identity. When that same browser visits your site, the network matches them.

  • Accuracy: individual-level. You get name, email, title, LinkedIn.
  • Coverage: depends on the network size. RB2B claims 40-60% match rates for US B2B traffic. Reality varies.
  • Cost: RB2B starts around $200/month. Clearbit Reveal, Demandbase, and 6sense are enterprise-priced.

3. Email-hash matching

Some tools match browser cookies to hashed email databases. When a user's hashed email appears in the database (from data partnerships, publisher networks, or opt-in sources), the tool matches it.

  • Accuracy: individual-level, but match rates are lower than identity graphs.
  • Cost: usually bundled in enterprise tools.

The pipeline we built

For one client (SecureW2), we implemented RB2B → warehouse → CRM:

Website visitor ──→ RB2B pixel ──→ RB2B webhook
                                      │
                                      ▼
                              Cloud Function (parse JSON)
                                      │
                                      ▼
                              Pub/Sub topic
                                      │
                                      ▼
                              BigQuery (raw events)
                                      │
                                      ▼
                              dbt models (stage, route, score)
                                      │
                                      ▼
                         Salesforce sync (via Hightouch)

Staging: parse the webhook payload

RB2B sends a JSON payload for each identified visitor:

-- models/staging/stg_rb2b__visitor_events.sql
SELECT
    event_id,
    TIMESTAMP(received_at)              AS identified_at,
    JSON_EXTRACT_SCALAR(payload, '$.email')         AS email,
    JSON_EXTRACT_SCALAR(payload, '$.first_name')    AS first_name,
    JSON_EXTRACT_SCALAR(payload, '$.last_name')     AS last_name,
    JSON_EXTRACT_SCALAR(payload, '$.title')          AS job_title,
    JSON_EXTRACT_SCALAR(payload, '$.company')        AS company_name,
    JSON_EXTRACT_SCALAR(payload, '$.linkedin_url')   AS linkedin_url,
    JSON_EXTRACT_SCALAR(payload, '$.page_url')       AS page_visited,
    CAST(JSON_EXTRACT_SCALAR(payload, '$.company_size') AS INT64) AS company_size
FROM {{ source('rb2b', 'raw_webhook_events') }}
WHERE JSON_EXTRACT_SCALAR(payload, '$.email') IS NOT NULL

Routing: CRM lookup + classification

Not every identified visitor deserves action. Route based on who they are:

-- models/intermediate/int_rb2b__routed_visitors.sql
SELECT
    v.*,
    sf_contact.contact_id       AS existing_contact_id,
    sf_opp.opportunity_id       AS active_opportunity_id,
    sf_opp.amount               AS opportunity_value,
    CASE
        WHEN sf_opp.opportunity_id IS NOT NULL
        THEN 'ACTIVE_OPPORTUNITY'      -- visiting during a deal = high signal
        WHEN sf_contact.contact_id IS NOT NULL
        THEN 'EXISTING_CONTACT'        -- known contact, re-engaging
        WHEN v.company_size >= 200
        THEN 'NEW_ENTERPRISE_VISITOR'  -- unknown, large company
        WHEN v.company_size >= 50
        THEN 'NEW_MID_MARKET_VISITOR'
        ELSE 'NEW_VISITOR'
    END AS routing_category,
    CASE
        WHEN sf_opp.opportunity_id IS NOT NULL THEN 'NOTIFY_OWNER_IMMEDIATELY'
        WHEN v.company_size >= 200 AND v.page_visited LIKE '%pricing%' THEN 'CREATE_LEAD'
        WHEN sf_contact.contact_id IS NOT NULL THEN 'UPDATE_ACTIVITY'
        ELSE 'ADD_TO_NURTURE'
    END AS routing_action
FROM {{ ref('stg_rb2b__visitor_events') }} v
LEFT JOIN {{ ref('stg_salesforce__contacts') }} sf_contact
    ON LOWER(v.email) = LOWER(sf_contact.email)
LEFT JOIN {{ ref('stg_salesforce__opportunities') }} sf_opp
    ON sf_contact.account_id = sf_opp.account_id
    AND sf_opp.stage NOT IN ('Closed Won', 'Closed Lost')

When someone from a company with an active $200K opportunity visits your pricing page — that's not a casual browse. That's buying behavior. Notify the account owner within minutes.

The free alternative: PostHog + IPinfo

If RB2B is too expensive or you want company-level identification without individual data:

-- PostHog captures IP (configurable)
-- Join to IPinfo for company matching
SELECT
    ph.session_id,
    ph.page_url,
    ph.session_start,
    ip.company_name,
    ip.company_domain,
    ip.employee_count,
    ip.industry
FROM {{ ref('stg_posthog__sessions') }} ph
LEFT JOIN {{ ref('stg_ipinfo__ip_companies') }} ip
    ON ph.ip_address = ip.ip_range_start
WHERE ip.company_name IS NOT NULL
  AND ip.company_type = 'business'  -- exclude ISPs, universities

You get company, not individual. But it's effectively free and requires no third-party pixel.

US: largely permissible for B2B. No federal law prohibits using IP-to-company matching or identity-graph data for business purposes. California's CCPA requires honoring opt-out requests, but B2B contact data has carve-outs.

EU/UK: GDPR makes individual-level deanonymization risky without explicit consent. Company-level (IP-to-company) is generally acceptable as legitimate interest. Individual identification via RB2B-style tools requires careful legal review and a clear consent mechanism.

Practical guidance: start with company-level identification everywhere. Layer individual identification only for US traffic where your legal counsel approves. Always honor opt-out requests immediately.

Realistic match rates

Vendor claims vs. reality:

ToolClaimed match rateRealistic rateLevel
RB2B40-60%20-40% (varies by traffic mix)Individual
Clearbit Reveal30-50%15-30%Individual + company
IPinfoN/A (deterministic)60-80% of business trafficCompany only
PostHog + IPinfo LiteN/A40-60% of business trafficCompany only

Match rates depend heavily on your traffic composition. If most visitors are US-based, at mid-to-large companies, on corporate networks — rates are higher. If traffic is global, includes SMBs, or is heavily mobile — lower.

The honest take

Deanonymization is powerful and imperfect. You'll identify a subset of visitors, not all of them. The subset is biased toward larger companies on corporate networks. Small-company visitors, remote workers, and mobile users will remain anonymous.

Don't build your strategy around 100% identification. Build it around the 20-40% you do identify — and make sure that data actually reaches your sales team in a format they can act on. A beautifully identified visitor sitting in a BigQuery table nobody queries is worth exactly nothing.


We build visitor-identification pipelines from webhook to warehouse to CRM — including the routing logic that turns a page view into a signal your sales team can act on. Book a discovery call if 97% of your traffic is walking away anonymous.

Got a similar problem?

30 minutes. We'll tell you honestlywhat's broken.