Macro to find first or last row of a group

I find it’s a really common pattern to want to find the first or last row of table by some column of that table. For instance, suppose you have a table of orders and you want to find each customers first (ie earliest) or last (ie latest) order and add that data to orders table.

I’ve written a macro for exactly this purpose for postgres based on this article.

{%- macro get_first_or_last_row(source, partition_by, order_by, column_label, get='first' ) -%}

{% set asc_or_desc = 'asc' if get == 'first' else 'desc'  %}

select
    *,
    {% if get == 'both' %}
    row_number() over (partition by {{ partition_by }} order by {{ order_by }} asc) = 1 as first_{{column_label}},
    row_number() over (partition by {{ partition_by }} order by {{ order_by }} desc) = 1 as last_{{column_label}}
    {% else %}
    row_number() over (partition by {{ partition_by }} order by {{ order_by }} {{ asc_or_desc }}) = 1 as {{column_label}}
    {% endif %}
from {{ source }}

{%- endmacro -%}

Here’s how you can use it in the example above:

with orders as (
   select * from {{ ref('stg_shopify__orders') }}
),

first_orders as (
  {{ get_first_or_last_row('orders', 'customer_id', 'processed_at', 'customer_order', get='both') }}
)

select 
  first_customer_order,
  last_customer_order,
  customer_id,
  order_id,
  processed_at
from first_orders order by customer_id, processed_at

Doing so outputs results that make it easy to see that the macro works as expected.

Let me know what you think and/or what could be done better.

4 Likes