# How to Merge Consecutive Rows in SQL?

### TL;DR

This article details step by step process of merging consecutive rows in SQL with a dummy example:

- Use SQL to create a table named
*lapse_time*in PostgreSQL with columns*bike_id*,*area_id*, and*time_spent*and insert dummy data into the table. - Use the
*ROW_NUMBER()*function to assign row numbers to each row in the table, ordering by*bike_id*. - Use the
*LAG()*function to find the previous values of*bike_id*,*area_id*for each row. - Use the
*SUM()*function with a CASE condition to group consecutive rows with the same*bike_id*and*area_id*. - Finally, use the GROUP BY clause to group the data based on the calculated groups and sum the
*time_spent*for each group. - The result will be the desired output showing aggregated time spent for each unique combination of
*bike_id*and*area_id*.

Recently, I was trying to group consecutive rows in SQL using one and then multiple column values. I looked up the internet on how to do and couldn't find anything so finally when I was able to figure it out, I thought I should share this so that it can be of some help others who might be facing the problem.

## The Problem

Let's start with defining and understanding the problem first. Let's say we are running a mobility company (take for example, Rapido). In order to solve a particular use case, we are collecting information on the time spent by each bike in different areas. Let's say this is represented as a table named lapse_time in our database with **bike_id**, **area_id** and the corresponding **time_spent** (in minutes).

Here is the information we have collected for example:

***Note:** This data is sorted by ** bike_id **and then by the order of entry of the row in the database. For example,

**first spent 120 minutes in**

*bike_1***. Then, it went to**

*area_1***and spent 10 minutes over there and so on.**

*area_2*Now if you take a close look at row numbers 2, 3, and 4 in figure 1, you will observe that all these consecutive rows have the same ** area_id** and

**, and as the data is sorted by the order of entry of the row in the database, it means that bike_1 spent 10 minutes in area_2, then 20 minutes again in**

*bike_id***, and then again 40 minutes in**

*area_2***. This looks very cumbersome and to solve the particular use case I was looking at, I needed to group together these 3 consecutive columns so that it conveys that**

*area_2***spent 70 (40 + 20 +10) minutes in**

*bike_1***. Let's take another example to make it abundantly clear. Both the consecutive rows, 6 and 7, have the same**

*area_2***and**

*area_id***, and thus can be grouped to the time_spent by the**

*bike_id***in**

*bike_2***as 37 (12 + 25) minutes.**

*area_2*Many of you might be wondering why at all have we got separate rows for the same ** bike_id** in the same area. That can be because of multiple reasons, primarily owing to the way how the data collection has been done.

For example, the hardware might be collecting information at specific intervals (e.g. every 10 minutes or so) and calculating the time spent in that interval. It might be the case that a bike is at the same location in multiple intervals, in which case we get separate rows. So, this scenario is expected in real-time data.

### The Solution

So, now let's try to solve this problem. The end goal here is to convert what it looks like in the figure 2 to that in the figure 3.

**To this →**

I personally encountered this problem while I was trying to calculate median idle time_spent in an area. If you take the data present in figure 2 and try to calculate the median idle time spent in let's say ** area_2**, you won't get the desired result as the data hasn't been arranged accordingly. In figure 3, we have arranged the data by merging consecutive rows having the same

**and**

*bike_id***; thus making it perfect for all sorts of calculations. Merging consecutive rows according to values of multiple columns was something that wasn't easily available, which then inspired me to write a blog regarding it.**

*area_id*So, how to do this? Let's go step by step.

Let's first make a dummy table in ** postgres** named as

*lapse_time*

Now, let's insert some dummy data into it.

We have a table exactly like the one in figure 2.

- Now, we need to add a
to our table. This can be done simply using the following query. Let's call it*row_number()**query_1*

Here's the result:

We need to calculate LAG of the columns based on which we want to perform the sum (in this case, **bike_id** and** area_id**). So, let's call the following operation **query_2.**

Here's the result for this query

Now comes the tricky part. We'll need to use sum over window function. In order to know more about the same, one should read more about window functions in postgres.

Before going to the actual query, here's one query that one should try to get clarity:

Here's the result for this query

For the actual query, we would use this with a slight modification in the ** sum over window function**. We would be using CASE condition with sum, so here's the actual query. Let's call this

*query_3.*

We would do the sum only when the column value is not equal to its lag value. Here's the result that we would get.

If we look at the grp column over here, we would notice that the consecutive rows with the same values of **bike_id** and **area_id** have the same value for grp. The final step is to use group by clause over the grp column, here's the query for the same:

And here's the result:

Now one can further go above this and select just the** bike_id, area_id** and **time_spent** from this table to get the exact result as in the Figure 3.

### The final query

Aggregating all the steps mentioned above, here's the final query for merging consecutive rows in SQL using one or multiple columns

## Further Reading

- This problem can further be generalized as recognition of patterns in SQL and rapid detection of unbroken sequences, better known as
**gaps and island problem**.