My submission to the posit table contest

This year, I decided to participate in the Posit Table Contest. Because I’m a fan of the great tables Python library I thought it would be great to enhance my mastery of this beautiful library by designing a table. I’ll use the Polars library for data munging and transformation.

Data collection

The data was collected from the Transport Security Administration website, which has archived data on the number of airport passenger checkings done daily from 2019 to 2023. The data was then saved as a parquet file containing columns Date and Numbers.

Data transformation

Reading the original data and displaying five random records.

import polars as pl
from pathlib import Path

data = pl.read_parquet(f"{Path('../../../')}/datasets/tsa.parquet")
data.sample(5)

shape: (5, 2)

Date	Numbers
date	i64
2022-07-25	2386005
2023-12-08	2431077
2020-04-02	160826
2023-04-19	2227164
2022-02-04	1583678

Creating new columns to enable more data transformations.

df = (data
 .with_columns(Year=pl.col('Date').dt.year(),
             Month=pl.col('Date').dt.month(),
             Day=pl.col('Date').dt.day())
 .with_columns(pl.when(pl.col('Month').eq(11) & pl.col('Day').eq(28))
             .then(pl.lit('Thanksgiving'))
             .when(pl.col('Month').eq(12) & pl.col('Day').eq(25))
             .then(pl.lit('Christmas'))
             .when(pl.col('Month').eq(7) & pl.col('Day').eq(4))
             .then(pl.lit('July 4th'))
             .when(pl.col('Month').eq(5) & pl.col('Day').eq(27))
             .then(pl.lit('Memorial Day'))
             .otherwise(None)
             .alias('Holiday')
             )
)
df.head()

shape: (5, 6)

Date	Numbers	Year	Month	Day	Holiday
date	i64	i32	i8	i8	str
2019-01-01	2201765	2019	1	1	null
2019-01-02	2424225	2019	1	2	null
2019-01-03	2279384	2019	1	3	null
2019-01-04	2230078	2019	1	4	null
2019-01-05	2049460	2019	1	5	null

In the code below, I’m creating a single-row dataframe, showing the highest number of checkings for each year. I’m leveraging looping to avoid repeating myself five times. The five dataframes are combined into a single dataframe called highest_df.

# Create high dataframe
high_dfs = []
for year in df['Year'].unique().to_list():
    high_df = pl.DataFrame({'Year':year,
            'Date':(df.filter(pl.col('Year') == year).filter(pl.col('Numbers') == pl.col('Numbers').max())['Date']),
            'Numbers':df.filter(pl.col('Year') == year)['Numbers'].max(),
            'Holiday':'Highest Record',
            })
    high_dfs.append(high_df)
highest_df = pl.concat(high_dfs).with_columns(pl.col('Numbers').cast(pl.Int64))
highest_df.sample(3)

shape: (3, 4)

Year	Date	Numbers	Holiday
i32	date	i64	str
2023	2023-11-26	2908785	"Highest Record"
2021	2021-11-28	2458325	"Highest Record"
2019	2019-12-01	2882915	"Highest Record"

To create a dataframe with the lowest number of checkings, I repeat the above process, replacing max() with min().

# Create low dataframe
low_dfs = []
for year in df['Year'].unique().to_list():
    low_df = pl.DataFrame({'Year':year,
            'Date':(df.filter(pl.col('Year') == year).filter(pl.col('Numbers') == pl.col('Numbers').min())['Date']),
            'Numbers':df.filter(pl.col('Year') == year)['Numbers'].min(),
            'Holiday':'Lowest Record',
            })
    low_dfs.append(low_df)
lowest_df = pl.concat(low_dfs).with_columns(pl.col('Numbers').cast(pl.Int64))
lowest_df.sample(3)

shape: (3, 4)

Year	Date	Numbers	Holiday
i32	date	i64	str
2022	2022-01-25	1063856	"Lowest Record"
2019	2019-11-28	1591158	"Lowest Record"
2021	2021-01-26	508467	"Lowest Record"

In this code, I create another dataframe with the total number of checkings for each year by using sum().

# Create total dataframe
tot_dfs = []
for year in df['Year'].unique().to_list():
    tot_df = pl.DataFrame({'Year':year,
            'Date':None,
            'Numbers':df.filter(pl.col('Year') == year)['Numbers'].sum(),
            'Holiday':'Annual Checkings',
            'Distribution':None
            })
    tot_dfs.append(tot_df)
total_df = pl.concat(tot_dfs).with_columns(pl.col('Year').cast(pl.Int32))
total_df.sample(3)

shape: (3, 5)

Year	Date	Numbers	Holiday	Distribution
i32	null	i64	str	null
2019	null	848102043	"Annual Checkings"	null
2023	null	858548196	"Annual Checkings"	null
2020	null	339774756	"Annual Checkings"	null

This code creates a dataframe where the value in Holiday is not null.

# Create holiday dataframe
hol_dfs = []
for year in df['Year'].unique().to_list():
    hol_df = (df
    .filter(pl.col('Year') == year)
    .filter(pl.col('Holiday').is_not_null())
    .select('Year','Date','Numbers','Holiday')
    )
    hol_dfs.append(hol_df)
holiday_dfs = pl.concat(hol_dfs)
holiday_dfs.sample(3)

shape: (3, 4)

Year	Date	Numbers	Holiday
i32	date	i64	str
2022	2022-07-04	2086854	"July 4th"
2020	2020-12-25	683838	"Christmas"
2023	2023-05-27	2206475	"Memorial Day"

The dataframe below creates a list of the average number of monthly checkings for each year. The created lists are row values for a column called Distribution.

# Create distribution dataframe
dist_dfs = []
for year in df['Year'].unique().to_list():
    dist_df = (df
    .filter(pl.col('Year') == year)
    .group_by('Month')
    .agg(pl.mean('Numbers'), pl.first('Year'))
    .sort('Month')
    .with_columns(Distribution=pl.col('Numbers').implode())
    .select('Year','Distribution').head(1)
    )
    dist_dfs.append(dist_df)
distribution_df = pl.concat(dist_dfs)
distribution_df.sample(3)

shape: (3, 2)

Year	Distribution
i32	list[f64]
2020	[2.0946e6, 2.1359e6, … 896292.225806]
2019	[1.9902e6, 2.0906e6, … 2.3577e6]
2022	[1.4902e6, 1.7396e6, … 2.1333e6]

Finally, I’m combining all the created dataframes into a single dataframe called TSA and adding another column Icon containing the names of the icons like the turkey and Christmas tree used in the stunning table. TSA is the dataframe used to make the stunning table.

# Combine all dataframes and add Icon column.
TSA = (pl.concat([holiday_dfs, highest_df, lowest_df])
.join(distribution_df, on='Year', how='inner')
.vstack(total_df)
.with_columns(pl.when(pl.col('Holiday') == "Thanksgiving")
            .then(pl.col('Distribution'))
            .otherwise(None)
            .alias('Distribution')
            )
.with_columns(pl.when(pl.col('Holiday') == "Memorial Day")
           .then(pl.lit('memorial.svg'))
           .when(pl.col('Holiday') == "July 4th")
           .then(pl.lit('flag.svg'))
           .when(pl.col('Holiday') == "Thanksgiving")
           .then(pl.lit('turkey.svg'))
           .when(pl.col('Holiday') == "Christmas")
           .then(pl.lit('christmas.svg'))
           .when(pl.col('Holiday') == "Highest Record")
           .then(pl.lit('high.svg'))
           .when(pl.col('Holiday') == "Lowest Record")
           .then(pl.lit('low.svg'))
           .otherwise(pl.lit('calendar.svg'))
           .alias('Icon')
           )
.select('Icon', 'Holiday', 'Year', 'Date', 'Numbers', 'Distribution')
.sort('Year')
)
TSA.sample(3)

shape: (3, 6)

Icon	Holiday	Year	Date	Numbers	Distribution
str	str	i32	date	i64	list[f64]
"flag.svg"	"July 4th"	2022	2022-07-04	2086854	null
"high.svg"	"Highest Record"	2020	2020-02-14	2507588	null
"low.svg"	"Lowest Record"	2021	2021-01-26	508467	null

After transforming the data into the desired format, I am now creating the table. The code below generates the table, which will be my final submission.

from great_tables import GT, html, loc, style, md, nanoplot_options

display(
    GT(TSA, rowname_col="Icon", groupname_col="Year")
    .tab_stubhead(label=html('<b style="font-family: Inter, sans-serif; font-weight: 500;">Year</b>'))
    .tab_header(title=html('''
        <link href="https://fonts.googleapis.com/css2?family=Inter:wght@400;900&display=swap" rel="stylesheet">
        <h2 style="text-align:center; display: flex; align-items: center; justify-content: center; font-family: Inter, sans-serif; font-weight: 500; color: #014eac;">
            <img src="assets/plane3.svg" alt="Plane Icon" style="margin-right: 34px; height: 54px;">
            TSA Airport Checkings<br>on Major Holidays
            <img src="assets/plane3.svg" alt="Plane Icon" style="margin-left: 34px; height: 54px;">
        </h2>
    '''))
    .tab_options(container_width="100%",
                 table_background_color='#F0FFF0',
                 heading_background_color="#C0C0C0",
                 column_labels_background_color="#696969",
                 row_group_font_weight='bold',
                 row_group_background_color='#C0C0C0',
                 source_notes_font_size='12px',
                 row_group_padding='8px',
                 table_font_names="Open Sans")
    .cols_label(Date=html('<b style="font-family: Inter, sans-serif; font-weight: 500;">Date</b>'),
                Numbers=html('<b style="font-family: Inter, sans-serif; font-weight: 500;">Checkings</b>'),
                Distribution=html('<b style="font-family: Inter, sans-serif; font-weight: 500;">Avg Monthly Checkings</b>'),
                Holiday='')
    .fmt_number(columns='Numbers', decimals=0)
    .cols_width(cases={'Date':'120px'})
    .fmt_date(columns="Date", date_style="day_m")
    .tab_style(style=style.text(color='#556B2F', weight='bold'),
               locations=loc.body(rows=pl.col("Holiday") == "Annual Checkings"))
    .tab_style(style=style.text(color='black', weight='normal'),
               locations=loc.body(columns="Holiday"))
    .sub_missing(missing_text='')
    .fmt_nanoplot(columns="Distribution", reference_line="mean",
                  options=nanoplot_options(data_point_radius=12,
                                           data_point_stroke_color="black",
                                           data_point_stroke_width=4,
                                           data_point_fill_color="white",
                                           data_line_type="straight",
                                           data_line_stroke_color="brown",
                                           data_line_stroke_width=2,
                                           data_area_fill_color="#FF8C00",
                                           vertical_guide_stroke_color="green"))
    .fmt_image("Icon", path="assets")
    .tab_source_note(source_note=md("**Source:** [TSA Passenger Volumes](https://www.tsa.gov/travel/passenger-volumes)<br/>**Designer:** Joram Mutenge<br/>*www.jorammutenge.com*"))
)

/Users/mute/Desktop/ne/summerfall/lib/python3.13/site-packages/great_tables/_render_checks.py:37: RenderWarning:

Rendering table with .col_widths() in Quarto may result in unexpected behavior. This is because Quarto performs custom table processing. Either use all percentage widths, or set .tab_options(quarto_disable_processing=True) to disable Quarto table processing.

TSA Airport Checkings on Major Holidays
Year		Date	Checkings	Avg Monthly Checkings
2019
	Memorial Day	27 May	2,512,237
	July 4th	4 Jul	2,088,760
	Thanksgiving	28 Nov	1,591,158
	Christmas	25 Dec	1,996,541
	Highest Record	1 Dec	2,882,915
	Lowest Record	28 Nov	1,591,158
	Annual Checkings		848,102,043
2020
	Memorial Day	27 May	283,387
	July 4th	4 Jul	495,309
	Thanksgiving	28 Nov	966,412
	Christmas	25 Dec	683,838
	Highest Record	14 Feb	2,507,588
	Lowest Record	14 Apr	113,147
	Annual Checkings		339,774,756
2021
	Memorial Day	27 May	1,867,067
	July 4th	4 Jul	1,697,422
	Thanksgiving	28 Nov	2,458,325
	Christmas	25 Dec	1,535,935
	Highest Record	28 Nov	2,458,325
	Lowest Record	26 Jan	508,467
	Annual Checkings		585,250,987
2022
	Memorial Day	27 May	2,397,928
	July 4th	4 Jul	2,086,854
	Thanksgiving	28 Nov	2,389,672
	Christmas	25 Dec	1,800,463
	Highest Record	27 Nov	2,639,616
	Lowest Record	25 Jan	1,063,856
	Annual Checkings		760,071,362
2023
	Memorial Day	27 May	2,206,475
	July 4th	4 Jul	2,007,441
	Thanksgiving	28 Nov	2,171,943
	Christmas	25 Dec	2,028,266
	Highest Record	26 Nov	2,908,785
	Lowest Record	31 Jan	1,534,786
	Annual Checkings		858,548,196
Source: TSA Passenger Volumes Designer: Joram Mutenge www.jorammutenge.com

Get the full code here. If you want to learn how to transform data like I did in this post, check out my Polars course.

Data collection

Data transformation

TSA Airport Checkings on Major Holidays

TSA Airport Checkings
on Major Holidays