# Framing Aggregates with WINDOW Functions

Welcome! This session assumes an intermediate understanding and usage of SQL

WINDOW functions in SQL allows to change the lens of the same data we are querying by adding context to the values being used. By design, window functions help you aggregate numbers or metadata so that each data row also contains general information related to the attributes

## What we're doing today

By the end of today, you will have become familiar with two window functions in SQL that allow you to group and aggregate values

### You will have the chance to follow along

# Let's Get Into It

## Loading and Previewing Data

Before we do anything to our data, let's see and understand it! By the end of this, we will know what our data table looks like, what each column represents, and what the smallest grain is i.e. what combination of attributes makes a row unique

0

0

0

2023-10-04

Trevor Thompson

1

2023-10-11

Ashley Martinez

2

2023-10-02

Maria Harris

3

2023-10-05

Steven Wood

4

2023-10-12

Stephanie Meyers

5

2023-10-05

James Ayala

6

2023-10-05

Susan Young

7

2023-10-04

Nicole Williams

8

2023-10-19

Michelle Norris

9

2023-10-14

Heidi Webb

Notice we have 135,369 rows in our table. This is going to be important in understanding the difference between a regular aggregation query and a window function aggregation query. Let's begin!

## Now, our functions

Let's try it!

ORDER BY - How would we track the volume of passengers changes each day by departure airport?

There are three parts of grouping to consider. We want

First, the typical grouping usage would look something like this

0

2023-10-01

ATL

1

2023-10-01

DCA

2

2023-10-01

DET

3

2023-10-01

JFK

4

2023-10-01

LAS

Remember the number of rows? In an aggregation, we lose grains -> we've gone from 135,369 rows to 220 rows and can't select the details of each passenger and route on each day. Keep this fact in your back pocket for now. Let's stay at this aggregate table and assume we got this as our starting dataset - just a count of departure totals.

Now what if we wanted to get the change over time? In the above case, you may create a new column after downloading this. Maybe in Excel with a pivot table. But what if we could get the cumulative number of customers each day with SQL?! Here come our window functions!

0

2023-10-01

DET

1

2023-10-02

DET

2

2023-10-03

DET

3

2023-10-04

DET

4

2023-10-05

DET

5

2023-10-06

DET

6

2023-10-07

DET

7

2023-10-08

DET

8

2023-10-09

DET

9

2023-10-10

DET

What did we just do?

You could show more information in this example by adding the destination field to your selection. This does not change the math, it extends the scope of your row. Remember - we are not hiding information with windows, we are improving our vantage point.

### Putting it all together

Let's combine the two previous steps into one. We now want to start from our original table of 135,369 rows, and select two values along with each row

590

2023-10-01

Michael Edwards

591

2023-10-01

Tyler Davies

592

2023-10-01

Laura Carter

593

2023-10-01

Taylor Waters

594

2023-10-01

Wendy Ortiz

595

2023-10-01

Rita Williams

596

2023-10-01

Lisa Graham

597

2023-10-01

Lindsey Mccoy

598

2023-10-02

Kerry Stanton

599

2023-10-02

April Stephens

Notice: you don't have to use a GROUP BY function with the aggregation above because you are not reducing rows (we are still at 135,369 rows). You are adding more information to each row. Hence the window opening 😆 That means, if you were looking at any single passenger's itinerary on a given day, you will also be able to see how many passengers total left that same passenger's departure airport (regardless of destination) on that day, and how many have departed so far up to and including that day.

‼️ Once again, be mindful to add your fields in the order you want them to be partitioned and ordered.

Now let's select one departure so we can see how the count changes with each day to make sure we got what we wanted

0

2023-10-01

DET

1

2023-10-02

DET

2

2023-10-03

DET

3

2023-10-04

DET

4

2023-10-05

DET

5

2023-10-06

DET

6

2023-10-07

DET

7

2023-10-08

DET

8

2023-10-09

DET

9

2023-10-10

DET

We can see for each of our 20 days in October, the count for each day's departure from Detroit, and the running total as the days increment!

### Let's try this again with the route (departure - destination combination)

The PARTITION BY clause is extendable by adding more variables in the order with which you want to partition AND order. Now we can group by not just the departure, but also the destination AND keep accumulating by the day. This is a smaller slice to aggregate, but again - we are not changing the number of rows, we are just adding more information to it.

0

2023-10-01

James Garcia

1

2023-10-01

Sherry Stevenson

2

2023-10-01

Megan Turner

3

2023-10-01

Shannon Lynch

4

2023-10-01

Kaitlyn Hernandez

5

2023-10-01

Nathaniel Long

6

2023-10-01

Michael Mathis

7

2023-10-01

Richard Wise

8

2023-10-01

Terry Garcia

9

2023-10-01

Kathryn Pruitt

Our selection gives us a result table named as running_totals_by_route which shows us how many people traveled from one airport to another on a day, and up to a given (inclusive) day. Below, we select just one route to see that I'm not fibbing

0

2023-10-01

DCA

1

2023-10-02

DCA

2

2023-10-03

DCA

3

2023-10-04

DCA

4

2023-10-05

DCA

5

2023-10-06

DCA

6

2023-10-07

DCA

7

2023-10-08

DCA

8

2023-10-09

DCA

9

2023-10-10

DCA

We did it!! In the 20 days of data we have, we can see the count of passengers each day, and the incrementing sum as days pass on this route

## How can we make this more robust?

While our queries above work great, this is just a sample dataset. You may want to account for robustness and edge cases when you move from a proof of concept to production queries.

### Let's improve this for a wider time series

The below changes demonstrate adding a month-year dimension to account for a bigger dataset, so that our grouping and cumulative sums are accurate and restart at each new month date <cue Bone Thugs N Harmony - 1st of the month>

0

2023-10-01

2023-10

1

2023-10-01

2023-10

2

2023-10-01

2023-10

3

2023-10-01

2023-10

4

2023-10-01

2023-10

5

2023-10-01

2023-10

6

2023-10-01

2023-10

7

2023-10-01

2023-10

8

2023-10-01

2023-10

9

2023-10-01

2023-10

0

2023-10-01

DCA

1

2023-10-02

DCA

2

2023-10-03

DCA

3

2023-10-04

DCA

4

2023-10-05

DCA

5

2023-10-06

DCA

6

2023-10-07

DCA

7

2023-10-08

DCA

8

2023-10-09

DCA

9

2023-10-10

DCA

## Further Learning

I hope you have enjoyed learning about this window function. To productionize your data queries, you may use the WINDOW function to define a specific window through which to apply aggregations for your rows. This function works with MySQL databases and in BigQuery. There are other ways to aggregate information for both numerical and string values. This was an introduction, so here are some more resources to expand your learning!

## Resources

For further learning, here are some of my favorite resources. Happy adventures in SQL 🥰