How do I reconstruct GA4 sessions in BigQuery?
GA4 doesn't store sessions as records in BigQuery exports — only individual events with session identifiers. To reconstruct sessions: join on `user_pseudo_id` + `(SELECT value.int_value FROM UNNEST(event_params) WHERE key='ga_session_id')` as the unique session key, then aggregate with MIN(event_timestamp) for session start, MAX(event_timestamp) for session end, COUNTIF(event_name='page_view') for pageviews, and SUM(engagement_time_msec) for engagement time.
The session start traffic source comes from traffic_source.source/medium/name at the user's first event in that session.
The full SQL pattern is below.
Why GA4 doesn't store sessions
In Universal Analytics, sessions were a first-class entity — every session had its own row with start/end times and aggregate metrics. GA4 fundamentally changed this: everything is an event. Sessions exist only as a tag (ga_session_id) attached to events.
The implication: if you want session-level metrics in BigQuery, you build them yourself. The standard reports in the GA4 UI do this aggregation behind the scenes; BigQuery exports give you raw events and expect you to do the work.
The benefit: full control. You can define sessions however your business actually thinks about them — not just GA4's default 30-minute timeout.
The session reconstruction pattern
Here's the canonical SQL pattern for reconstructing sessions from raw GA4 BigQuery events:
Run this with the appropriate date range in _TABLE_SUFFIX and you have one row per session with everything you need.
Why user_pseudo_id + ga_session_id
Both fields are needed because:
- user_pseudo_id alone doesn't isolate sessions — a single user has many sessions
- ga_session_id alone isn't unique across users — different users can have the same session_id (the integer is just an incrementing counter per user, not globally unique)
The combination is the unique session key. Some implementations also include ga_session_number (the sequential number of the session for that user) for additional context, but it's not needed for uniqueness.
For cross-device or logged-in user analysis, also include user_id if your implementation sets it:
This unifies the user across devices when login is captured.
Engagement time — the GA4 UI match
The engagement time calculation needs care to match GA4's UI numbers:
engagement_time_msec is sent as an event parameter on most events. Sum across all events in the session, divide by 1000 for seconds. This matches GA4's "Average engagement time per session" calculation in standard reports.
Want to see which hidden implementation gaps are affecting your GA4 data quality?
A common gotcha: some implementations don't capture engagement_time_msec on every event. Verify by checking event_params for missing values — if more than 5% of events lack engagement_time_msec, your engagement metrics will under-report vs the GA4 UI.
The 'engaged session' definition
GA4 marks a session as "engaged" if any of these are true:
- Engagement time >= 10 seconds, OR
- 2+ pageviews/screen_views, OR
- 1+ conversion event
The CASE statement in the SQL above replicates this exactly. It's how you reproduce GA4's "Engaged sessions" metric in your custom reporting.
Session start traffic source
Traffic source data lives on multiple events but the session-start source is what you typically want. Two approaches:
Approach 1: Use traffic_source on session_start event
Approach 2: Use ANY_VALUE() in aggregation The pattern above uses ANY_VALUE(source) which works because traffic_source is consistent within a session — every event in a session has the same traffic_source values. ANY_VALUE() is faster than the explicit join to session_start.
For most reporting, ANY_VALUE() is sufficient and more efficient.
Custom session definitions
GA4's default session timeout is 30 minutes of inactivity. To analyse with a different timeout (e.g., 15 minutes for short-cycle e-commerce, 4 hours for long-form content), you build the sessionisation in SQL:
This sessionises with a 15-minute timeout instead of the default 30. Adjust the 900 threshold (in seconds) for any custom timeout.
Common mistakes
1. Forgetting NULL session_ids. Some events (especially when the GA4 SDK initialises) don't have ga_session_id. Filter IS NOT NULL early or you'll get phantom sessions with NULL keys.
2. Treating ga_session_id as globally unique. It's not. Always join with user_pseudo_id.
3. Using traffic_source on every event. Some events don't have traffic_source populated. Use ANY_VALUE() to handle gaps gracefully.
4. Calculating session duration as MAX-MIN. This works for sessions with multiple events but gives 0 for single-event sessions. Either accept that or set a default minimum (e.g., 1 second for single-event sessions).
5. Not partitioning the output. Session reconstruction tables can grow large. Partition by event_date (or session_start_date) and cluster by source/medium for efficient downstream queries.
FAQ: How to Stitch GA4 BigQuery Sessions Manually
What should a team validate first when how to stitch ga4 bigquery sessions manually appears?
How do I know whether the fix actually worked?
When should this become a full GA4 audit instead of a quick fix?
Related guides for How to Stitch GA4 BigQuery Sessions Manually
BigQuery Cost Optimisation for GA4 Exports: 9 SQL Patterns (2026)
The biggest cost wins come from nine SQL patterns: (1) partition pruning via _TABLE_SUFFIX BETWEEN (10–50x cost difference vs derived filters), (2) clustering on source/medium/event_name (30–60% reduction on top of partitioning), (3) explicit column selection (never SELECT *)…
GA4 BigQuery Schema Cheat Sheet: Every Field, What It Does (2026)
The GA4 BigQuery export contains 30+ top-level fields per event row, organised into seven groups: identity (user_pseudo_id, user_id), timing (event_timestamp, event_date), event metadata (event_name, event_params), traffic source (traffic_source.*, session_traffic_source_last_click.* added 2024)…
Run a GA4 audit before how to stitch ga4 bigquery sessions manually spreads into reporting decisions
Use GA4 Audits to surface implementation gaps, broken signals, and the next fixes to prioritize before the issue becomes harder to trust or explain.