← Back to blog
engineering·February 8, 2026·3 min read

BigQuery NET.REG_DOMAIN and NET.HOST — stop writing regex to parse URLs

NET.REG_DOMAIN and NET.HOST handle URL parsing in BigQuery natively. Cleaner SQL, correct TLD handling, zero regex. Here's how to use them.

engineering

BigQuery NET.REG_DOMAIN and NET.HOST — stop writing regex to p…

Everyone who's worked with a web analytics table has written that one regex to pull the domain out of a URL. It starts simple. Then you hit a URL with no protocol. Then one with a subdomain. Then a country-code TLD that your pattern didn't see coming. Three hours later, your regex is a line wrap long and you're not sure what it does anymore.

You don't have to. BigQuery has a NET function family for exactly this.

Two functions you need to know

NET.HOST()

Returns the full host, including subdomains.

NET.HOST("https://developers.google.com/bigquery")  -- developers.google.com

Useful when subdomain matters — referrer analysis, multi-tenant SaaS logs, anywhere the subdomain carries signal.

NET.REG_DOMAIN()

Returns just the registrable domain. Subdomain stripped, path ignored.

NET.REG_DOMAIN("https://developers.google.com/bigquery")  -- google.com

Useful when you want to group things: "all traffic from Google" regardless of which subdomain it came from.

A practical example

Drop these into a staging model:

SELECT
  raw_url,
  NET.HOST(raw_url)        AS host,
  NET.REG_DOMAIN(raw_url)  AS root_domain
FROM {{ source('analytics', 'raw_sessions') }}
WHERE raw_url IS NOT NULL

Two clean columns. Zero regex. Done.

A note on NULL handling

Both functions return NULL for malformed or unrecognizable input. That's a feature, not a bug — you get explicit signal you can test on, instead of a silent empty string.

Add it to your dbt tests:

columns:
  - name: root_domain
    tests:
      - not_null:
          config:
            severity: warn

A non-zero NULL rate gets surfaced instead of hiding in your data.

Where this lives in a real stack

URL parsing belongs in your staging layer, not in a BI tool's calculated field and definitely not in ad-hoc dashboard SQL. Parse once, store clean columns, have everything downstream consume the clean version.

That keeps three things true:

  • Definitions stay consistent across tools.
  • Query cost drops (no repeated regex on every dashboard load).
  • A change to parsing logic is a one-file PR.

The NET namespace has more

NET.IP_FROM_STRING() and NET.IP_TO_STRING() exist if you're doing IP work. The full reference has a handful of other utilities worth a scan.

The bigger point

Infrastructure gets reliable when teams use standardized primitives instead of ad-hoc ones. A small decision made consistently early — like parse URLs in staging, using the standard function — saves you from tens of dashboards each doing it their own slightly-wrong way.

Boring choices, made once, compound well.


We help teams make hundreds of these boring-but-compounding decisions across their data stack. If your BigQuery project could use a cleanup pass, let's talk.

Got a similar problem?

30 minutes. We'll tell you honestlywhat's broken.