I find it’s a really common pattern to want to find the first or last row of table by some column of that table. For instance, suppose you have a table of orders and you want to find each customers first (ie earliest) or last (ie latest) order and add that data to orders table.
I’ve written a macro for exactly this purpose for postgres based on this article.
{%- macro get_first_or_last_row(source, partition_by, order_by, column_label, get='first' ) -%}
{% set asc_or_desc = 'asc' if get == 'first' else 'desc' %}
select
*,
{% if get == 'both' %}
row_number() over (partition by {{ partition_by }} order by {{ order_by }} asc) = 1 as first_{{column_label}},
row_number() over (partition by {{ partition_by }} order by {{ order_by }} desc) = 1 as last_{{column_label}}
{% else %}
row_number() over (partition by {{ partition_by }} order by {{ order_by }} {{ asc_or_desc }}) = 1 as {{column_label}}
{% endif %}
from {{ source }}
{%- endmacro -%}
Here’s how you can use it in the example above:
with orders as (
select * from {{ ref('stg_shopify__orders') }}
),
first_orders as (
{{ get_first_or_last_row('orders', 'customer_id', 'processed_at', 'customer_order', get='both') }}
)
select
first_customer_order,
last_customer_order,
customer_id,
order_id,
processed_at
from first_orders order by customer_id, processed_at
Doing so outputs results that make it easy to see that the macro works as expected.
Let me know what you think and/or what could be done better.