Back to blog
|6 min read

Source/Medium Cardinality in GA4: When Too Many Values Break Your Reports

A source/medium report should give you a clean picture of which channels drive traffic. When cardinality is high, hundreds or thousands of distinct source/medium combinations, the report becomes a wall of noise where meaningful channels are buried under hundreds of one session entries.

What Causes High Source/Medium Cardinality

The most common cause is dynamic UTM parameter values being set programmatically without normalisation.

If utm_source is populated from a variable that pulls timestamps, session IDs, or user specific values, each session generates a unique source value, quickly creating thousands of distinct entries.

A related problem is email marketing platforms that automatically append tracking parameters using their own format rather than the standard UTM convention, creating source values like "em_12345_campaign_name_20260310" that are unique per send.

Affiliate marketing programmes are another frequent contributor: affiliates that set utm_source to their own tracking codes without normalising to a consistent naming convention create hundreds of affiliate specific source values rather than consolidating under a single affiliate source.

Third-party tools that append query parameters to URLs without coordination with the analytics team, live chat systems, A/B testing platforms, personalisation engines, can also inadvertently create source/medium pollution if those parameters are picked up by GA4's traffic source detection logic.

How to Detect and Quantify the Problem

The quickest way to quantify source/medium cardinality is to export the full source/medium dimension from GA4 for a 90-day period, either via the standard Acquisition report exported to a spreadsheet, or via a BigQuery query that counts distinct source/medium pairs.

Sort by session count ascending to find the long tail of one session or two session entries.

If the bottom 90% of distinct source/medium values account for less than 5% of total sessions, you have a significant cardinality problem: most of your distinct source values are noise.

Look for patterns in the low session values, are they all from one email platform? All from affiliate links? All sharing a common prefix?

Identifying the pattern tells you which upstream system is creating the pollution and where the fix needs to be implemented.

Fixing Cardinality Without Losing Granularity

The fix for source/medium cardinality is normalising UTM values at the point of creation rather than trying to clean them up after the fact.

For email marketing, establish a convention where utm_source is the platform name (mailchimp, klaviyo, hubspot) and utm_campaign carries the campaign identifier.

For affiliates, require that all affiliate UTMs use utm_source=affiliate and utm_content for the affiliate identifier, this consolidates all affiliate traffic under one source/medium pair while preserving the ability to distinguish individual affiliates.

For dynamic UTM generation, add a normalisation step that strips or replaces unique per session values before they are inserted into the URL.

For existing historical data, BigQuery allows you to create a cleaned view of source/medium data by applying regex transformations to normalise the noisy values, this does not fix the raw data but makes the data usable for historical analysis without waiting for the live fix to accumulate a new clean baseline.

Ready to audit your GA4 property?

Run a full GA4 audit in under 10 minutes. Free to start.

Start Free Audit