Everyone says they do competitive intelligence. Most of what passes for it is someone checking a competitor's website before a board meeting.
Real competitive intelligence is continuous, structured, and automated. It lives in your warehouse alongside your own data, updates on a schedule, and surfaces changes you'd otherwise miss. The competitor who quietly hires three data engineers in a month is telling you something their marketing never will.
Three types of competitive data
1. Hiring signals
Job postings are the most honest public signal a company produces. Marketing copy is aspirational. Press releases are curated. Job descriptions are operational — they describe what a company is actually building.
What to watch:
- New roles that didn't exist before. A competitor posting for a "Head of AI" when they've never had one signals a strategic shift.
- Volume in a specific function. Three data-engineering roles posted in a month means they're building (or rebuilding) their data stack.
- Tech stack in job descriptions. "Experience with Snowflake and dbt required" tells you exactly what they've adopted.
- Seniority patterns. All senior hires means they're building a new function. All junior hires means they're scaling an existing one.
Where to get it:
- Greenhouse / Lever / Ashby — most companies use an ATS with a public careers page. Scrape on a schedule.
- LinkedIn Jobs API — if you have access.
- Job board aggregators — Indeed, Glassdoor, Google Jobs all expose structured data.
2. Tech-stack signals
Knowing what technology a competitor uses tells you what they can and can't do. A company running Salesforce + HubSpot + Google Ads has a very different capability than one running Snowflake + dbt + Hightouch.
Sources:
- BuiltWith / Wappalyzer — detect technologies from JavaScript tags, headers, DNS records.
- HG Insights — enterprise-grade tech-stack intelligence.
- Job postings — the tech stack shows up in required skills.
- GitHub — open-source contributions reveal tools and languages.
- G2 / TrustRadius reviews — customers mention which products they use alongside the one they're reviewing.
3. Market signals
Broader indicators of competitive activity:
- Branded search volume — rising branded search means growing awareness (covered in our Share of Search post).
- Press and funding — Crunchbase, PitchBook, or news monitoring.
- Product launches — Product Hunt, press releases, changelog pages.
- Pricing changes — periodic checks of competitor pricing pages (or archive.org for historical).
- Customer reviews — G2, TrustRadius, Capterra. What customers complain about is more useful than what they praise.
Building a competitive intelligence pipeline
The lightweight version we've built for clients:
Job boards ──→ Scraper (Cloud Function) ──→ BigQuery
BuiltWith API ──→ Cloud Function ──→ BigQuery
G2 reviews ──→ Scraper ──→ BigQuery
News API ──→ Cloud Function ──→ BigQuery
│
▼
dbt models (enrich, score, classify)
│
▼
Alerts (Slack / email) + Dashboard
The job-posting model
-- models/staging/stg_competitor__job_postings.sql
SELECT
posting_id,
company_name,
job_title,
department,
seniority_level,
location,
posted_date,
description_text,
-- Extract tech stack mentions from description
CASE WHEN LOWER(description_text) LIKE '%snowflake%' THEN TRUE ELSE FALSE END AS mentions_snowflake,
CASE WHEN LOWER(description_text) LIKE '%dbt%' THEN TRUE ELSE FALSE END AS mentions_dbt,
CASE WHEN LOWER(description_text) LIKE '%databricks%' THEN TRUE ELSE FALSE END AS mentions_databricks,
CASE WHEN LOWER(description_text) LIKE '%airflow%' THEN TRUE ELSE FALSE END AS mentions_airflow
FROM {{ source('competitor_intel', 'raw_job_postings') }}
WHERE company_name IN (SELECT company_name FROM {{ ref('seed_competitors') }})Scoring and alerting
-- models/marts/mart_competitor_signals.sql
SELECT
company_name,
DATE_TRUNC(posted_date, WEEK) AS week,
COUNT(*) AS new_postings,
COUNT(CASE WHEN department = 'Engineering' THEN 1 END) AS eng_postings,
COUNT(CASE WHEN department = 'Data' THEN 1 END) AS data_postings,
COUNT(CASE WHEN department = 'Sales' THEN 1 END) AS sales_postings,
COUNT(CASE WHEN mentions_snowflake OR mentions_dbt OR mentions_databricks THEN 1 END) AS data_stack_postings,
-- Signal scoring
CASE
WHEN COUNT(CASE WHEN department = 'Data' THEN 1 END) >= 3 THEN 'HIGH'
WHEN COUNT(CASE WHEN department = 'Data' THEN 1 END) >= 1 THEN 'MEDIUM'
ELSE 'LOW'
END AS data_investment_signal
FROM {{ ref('stg_competitor__job_postings') }}
GROUP BY 1, 2When data_investment_signal flips to HIGH for a competitor, push a Slack alert. That's actionable — it means a competitor is building or rebuilding their data capability, which directly affects your competitive positioning.
LLM-powered classification
For deeper analysis, run job descriptions through an LLM to extract structured intelligence:
# Classify job posting intent using Claude
prompt = f"""
Analyze this job posting and extract:
1. Primary function (engineering, data, sales, marketing, product)
2. Seniority (junior, mid, senior, lead, executive)
3. Tech stack mentioned
4. Strategic signal (building new capability, scaling existing, replacing departed)
Job posting: {description}
"""Store the LLM output as structured columns in your staging model. Now you can query: "Which competitors are building new data capabilities this quarter?" with high precision.
What most companies get wrong
1. Point-in-time snapshots instead of time series
Checking a competitor's website once tells you nothing. Checking it weekly and tracking changes tells you everything. The value is in the delta — what changed, when, and what pattern it forms.
2. Tracking too many competitors
Pick 3-5 direct competitors. Exhaustive competitive sets produce noise, not insight. Your CEO doesn't need to know about 20 companies. They need to know about the 3 that compete for the same deals.
3. No connection to your own data
Competitive intelligence in a standalone tool is interesting. Competitive intelligence joined to your CRM is actionable. When you can see that a competitor just launched a new feature and three of your at-risk accounts have been evaluating them on G2 — that's a signal you can act on today.
The honest constraint
Most competitive data is estimated, scraped, or inferred. Job postings are real. Tech-stack detection is probabilistic. Traffic estimates are approximate. Review counts are exact but review content is biased.
Don't treat competitive intelligence as ground truth. Treat it as signal. A competitor posting 5 data roles, their branded search volume climbing, and two of your customers reviewing their product on G2 — each signal alone is weak. Together, they tell a clear story.
Your competitor just posted three data-engineering roles. That tells you more about their roadmap than their blog ever will.
We build competitive-intelligence pipelines that track hiring signals, tech-stack changes, and market indicators — all feeding into your warehouse alongside your own CRM and analytics data. Book a discovery call if you want to stop guessing what your competitors are building.