SQL Window Functions: Deduplication, Gaps and Islands, and Sessionization

Use ROW_NUMBER, LAG, and running groups to deduplicate rows, detect gaps, and form islands or sessions.

View
StandardDetailedCompact
Export
Copy the compact sheet, download it, or print it.
Download
`D` dense toggle · `C` copy all

Deduplication patterns

Window functions make cleanup and latest-row selection easy.

Deduplicate and keep the latest row

Choose one row per business key.

sqlANYdedupelatest
sql
WITH ranked AS (
  SELECT t.*, 
         ROW_NUMBER() OVER (
           PARTITION BY email
           ORDER BY updated_at DESC, id DESC
         ) AS rn
  FROM users t
)
SELECT *
FROM ranked
WHERE rn = 1;
Notes

This pattern is safer and clearer than many self-join alternatives.

Delete duplicates after ranking

Keep the newest record and remove the rest.

sqlANYdeleteduplicates
sql
WITH ranked AS (
  SELECT id,
         ROW_NUMBER() OVER (
           PARTITION BY email
           ORDER BY updated_at DESC, id DESC
         ) AS rn
  FROM users
)
DELETE FROM users
WHERE id IN (
  SELECT id FROM ranked WHERE rn > 1
);
Notes

Exact delete syntax can vary by database, but the ranking step is the core idea.

Gaps and islands

Detect streaks, contiguous runs, and session boundaries.

Flag a new session when the gap is too large

Start a new session after 30 minutes of inactivity.

sqlANYsessionizationlag
sql
WITH flagged AS (
  SELECT user_id,
         event_time,
         CASE
           WHEN event_time - LAG(event_time) OVER (
             PARTITION BY user_id
             ORDER BY event_time
           ) > INTERVAL '30 minutes'
           OR LAG(event_time) OVER (
             PARTITION BY user_id
             ORDER BY event_time
           ) IS NULL
           THEN 1 ELSE 0
         END AS is_new_session
  FROM user_events
)
SELECT *
FROM flagged;
Notes

This is the first step in sessionization. Interval syntax may differ by dialect.

Assign session IDs with a running sum

Convert session boundary flags into stable groups.

sqlANYsessionsrunning-sum
sql
WITH flagged AS (
  SELECT user_id,
         event_time,
         CASE
           WHEN event_time - LAG(event_time) OVER (
             PARTITION BY user_id
             ORDER BY event_time
           ) > INTERVAL '30 minutes'
           OR LAG(event_time) OVER (
             PARTITION BY user_id
             ORDER BY event_time
           ) IS NULL
           THEN 1 ELSE 0
         END AS is_new_session
  FROM user_events
)
SELECT user_id,
       event_time,
       SUM(is_new_session) OVER (
         PARTITION BY user_id
         ORDER BY event_time
         ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
       ) AS session_id
FROM flagged;
Notes

A cumulative sum transforms boundary rows into session group numbers.

Group consecutive dates into streaks

Classic islands pattern using row-number arithmetic.

sqlANYstreaksislands
sql
WITH numbered AS (
  SELECT user_id,
         activity_date,
         ROW_NUMBER() OVER (
           PARTITION BY user_id
           ORDER BY activity_date
         ) AS rn
  FROM daily_activity
), grouped AS (
  SELECT user_id,
         activity_date,
         activity_date - rn * INTERVAL '1 day' AS grp_key
  FROM numbered
)
SELECT user_id,
       MIN(activity_date) AS streak_start,
       MAX(activity_date) AS streak_end,
       COUNT(*) AS streak_days
FROM grouped
GROUP BY user_id, grp_key;
Notes

A timeless trick for finding consecutive runs. Date arithmetic varies slightly by dialect.

Recommended next

No recommendations yet.