← Back to blog
strategy·May 13, 2026·6 min read

Competitor intelligence — what data can you actually get?

Your competitor just posted three data-engineering roles. That tells you more about their roadmap than their blog ever will.

strategy

Competitor intelligence — what data can you actually get?

Everyone says they do competitive intelligence. Most of what passes for it is someone checking a competitor's website before a board meeting.

Real competitive intelligence is continuous, structured, and automated. It lives in your warehouse alongside your own data, updates on a schedule, and surfaces changes you'd otherwise miss. The competitor who quietly hires three data engineers in a month is telling you something their marketing never will.

Three types of competitive data

1. Hiring signals

Job postings are the most honest public signal a company produces. Marketing copy is aspirational. Press releases are curated. Job descriptions are operational — they describe what a company is actually building.

What to watch:

  • New roles that didn't exist before. A competitor posting for a "Head of AI" when they've never had one signals a strategic shift.
  • Volume in a specific function. Three data-engineering roles posted in a month means they're building (or rebuilding) their data stack.
  • Tech stack in job descriptions. "Experience with Snowflake and dbt required" tells you exactly what they've adopted.
  • Seniority patterns. All senior hires means they're building a new function. All junior hires means they're scaling an existing one.

Where to get it:

  • Greenhouse / Lever / Ashby — most companies use an ATS with a public careers page. Scrape on a schedule.
  • LinkedIn Jobs API — if you have access.
  • Job board aggregators — Indeed, Glassdoor, Google Jobs all expose structured data.

2. Tech-stack signals

Knowing what technology a competitor uses tells you what they can and can't do. A company running Salesforce + HubSpot + Google Ads has a very different capability than one running Snowflake + dbt + Hightouch.

Sources:

  • BuiltWith / Wappalyzer — detect technologies from JavaScript tags, headers, DNS records.
  • HG Insights — enterprise-grade tech-stack intelligence.
  • Job postings — the tech stack shows up in required skills.
  • GitHub — open-source contributions reveal tools and languages.
  • G2 / TrustRadius reviews — customers mention which products they use alongside the one they're reviewing.

3. Market signals

Broader indicators of competitive activity:

  • Branded search volume — rising branded search means growing awareness (covered in our Share of Search post).
  • Press and funding — Crunchbase, PitchBook, or news monitoring.
  • Product launches — Product Hunt, press releases, changelog pages.
  • Pricing changes — periodic checks of competitor pricing pages (or archive.org for historical).
  • Customer reviews — G2, TrustRadius, Capterra. What customers complain about is more useful than what they praise.

Building a competitive intelligence pipeline

The lightweight version we've built for clients:

Job boards ──→ Scraper (Cloud Function) ──→ BigQuery
BuiltWith API ──→ Cloud Function ──→ BigQuery
G2 reviews ──→ Scraper ──→ BigQuery
News API ──→ Cloud Function ──→ BigQuery
         │
         ▼
      dbt models (enrich, score, classify)
         │
         ▼
   Alerts (Slack / email) + Dashboard

The job-posting model

-- models/staging/stg_competitor__job_postings.sql
SELECT
    posting_id,
    company_name,
    job_title,
    department,
    seniority_level,
    location,
    posted_date,
    description_text,
    -- Extract tech stack mentions from description
    CASE WHEN LOWER(description_text) LIKE '%snowflake%' THEN TRUE ELSE FALSE END AS mentions_snowflake,
    CASE WHEN LOWER(description_text) LIKE '%dbt%' THEN TRUE ELSE FALSE END AS mentions_dbt,
    CASE WHEN LOWER(description_text) LIKE '%databricks%' THEN TRUE ELSE FALSE END AS mentions_databricks,
    CASE WHEN LOWER(description_text) LIKE '%airflow%' THEN TRUE ELSE FALSE END AS mentions_airflow
FROM {{ source('competitor_intel', 'raw_job_postings') }}
WHERE company_name IN (SELECT company_name FROM {{ ref('seed_competitors') }})

Scoring and alerting

-- models/marts/mart_competitor_signals.sql
SELECT
    company_name,
    DATE_TRUNC(posted_date, WEEK) AS week,
    COUNT(*) AS new_postings,
    COUNT(CASE WHEN department = 'Engineering' THEN 1 END) AS eng_postings,
    COUNT(CASE WHEN department = 'Data' THEN 1 END) AS data_postings,
    COUNT(CASE WHEN department = 'Sales' THEN 1 END) AS sales_postings,
    COUNT(CASE WHEN mentions_snowflake OR mentions_dbt OR mentions_databricks THEN 1 END) AS data_stack_postings,
    -- Signal scoring
    CASE
        WHEN COUNT(CASE WHEN department = 'Data' THEN 1 END) >= 3 THEN 'HIGH'
        WHEN COUNT(CASE WHEN department = 'Data' THEN 1 END) >= 1 THEN 'MEDIUM'
        ELSE 'LOW'
    END AS data_investment_signal
FROM {{ ref('stg_competitor__job_postings') }}
GROUP BY 1, 2

When data_investment_signal flips to HIGH for a competitor, push a Slack alert. That's actionable — it means a competitor is building or rebuilding their data capability, which directly affects your competitive positioning.

LLM-powered classification

For deeper analysis, run job descriptions through an LLM to extract structured intelligence:

# Classify job posting intent using Claude
prompt = f"""
Analyze this job posting and extract:
1. Primary function (engineering, data, sales, marketing, product)
2. Seniority (junior, mid, senior, lead, executive)
3. Tech stack mentioned
4. Strategic signal (building new capability, scaling existing, replacing departed)
 
Job posting: {description}
"""

Store the LLM output as structured columns in your staging model. Now you can query: "Which competitors are building new data capabilities this quarter?" with high precision.

What most companies get wrong

1. Point-in-time snapshots instead of time series

Checking a competitor's website once tells you nothing. Checking it weekly and tracking changes tells you everything. The value is in the delta — what changed, when, and what pattern it forms.

2. Tracking too many competitors

Pick 3-5 direct competitors. Exhaustive competitive sets produce noise, not insight. Your CEO doesn't need to know about 20 companies. They need to know about the 3 that compete for the same deals.

3. No connection to your own data

Competitive intelligence in a standalone tool is interesting. Competitive intelligence joined to your CRM is actionable. When you can see that a competitor just launched a new feature and three of your at-risk accounts have been evaluating them on G2 — that's a signal you can act on today.

The honest constraint

Most competitive data is estimated, scraped, or inferred. Job postings are real. Tech-stack detection is probabilistic. Traffic estimates are approximate. Review counts are exact but review content is biased.

Don't treat competitive intelligence as ground truth. Treat it as signal. A competitor posting 5 data roles, their branded search volume climbing, and two of your customers reviewing their product on G2 — each signal alone is weak. Together, they tell a clear story.

Your competitor just posted three data-engineering roles. That tells you more about their roadmap than their blog ever will.


We build competitive-intelligence pipelines that track hiring signals, tech-stack changes, and market indicators — all feeding into your warehouse alongside your own CRM and analytics data. Book a discovery call if you want to stop guessing what your competitors are building.

Got a similar problem?

30 minutes. We'll tell you honestlywhat's broken.