How to Pivot and Unpivot Row Data in Spark

Data transformation is an essential task in the data processing pipeline, and Spark provides a variety of functions and methods to achieve it. One of the most common transformation tasks is pivoting and unpivoting data. In this blog, we will discuss how to pivot and unpivot data in Spark.

Pivoting Data

Pivoting is a process of transforming a dataset from a long to a wide format by rotating the rows to columns. In Spark, we can pivot data using the pivot() function. The pivot function requires three arguments: the first argument is the pivot column, the second argument is the values column, and the third argument is the list of values to pivot.

Example

Consider the following example:

+----+------+-------+ 
|year|month |amount | 
+----+------+-------+ 
|2021|Jan |100 | 
|2021|Feb |200 | 
|2021|Mar |300 | 
|2022|Jan |400 | 
|2022|Feb |500 | 
|2022|Mar |600 | 
+----+------+-------+ 

If we want to pivot the data based on the month column, we can use the following code:

df.pivot("month", "year").show() 

The output will be:

+----+----+----+ 
| _1 |2021|2022| 
+----+----+----+ 
|Jan |100 |400 | 
|Feb |200 |500 | 
|Mar |300 |600 | 
+----+----+----+ 

Unpivoting Data

Unpivoting is a process of transforming a dataset from a wide to a long format by rotating the columns to rows. In Spark, we can unpivot data using the explode() function.

Example

Consider the following example:

+----+------+------+------+ 
|year| Jan | Feb | Mar | 
+----+------+------+------+ 
|2021| 100 | 200 | 300 | 
|2022| 400 | 500 | 600 | 
+----+------+------+------+ 

If we want to unpivot the data based on the year column, we can use the following code:

from pyspark.sql.functions import expr, explode 
        
df.selectExpr("year", "stack(3, 'Jan', Jan, 'Feb', Feb, 'Mar', Mar) as (month, amount)")
    .select("year", explode(expr("map(month, amount)"))).show() 

The output will be:

+----+-----+---+ 
|year| key|value| 
+----+-----+---+ 
|2021| Jan|100| 
|2021| Feb|200| 
|2021| Mar|300| 
|2022| Jan|400| 
|2022| Feb|500| 
|2022| Mar|600| 
+----+-----+---+ 

Conclusion

Pivoting and unpivoting data are essential transformation techniques used in data processing. Spark provides various functions and methods to perform these transformations efficiently. In this blog, we discussed how to pivot and unpivot data in Spark with examples.