Forward Fill Macro

When you have an event stream, it’s common to need to find the latest non-null value in the stream and “forward-fill” it all subsequent events.

For instance, suppose you have a stream of web events. In one such event a user creates a profile with relevant properties (email, location, whatever). All subsequent events in the stream should have that information attached to the event. To do that, you need to forward fill the relevant properties.

Here’s a macro to do that based on this blog post:

{%- macro forward_fill(source, id, timestamp, field) -%}

    with init_data as (
        select 
            {{ id }} as id,
            {{ timestamp }} as timestamp,
            {{ field }} as field
        from {{ source }}
    ),

    grouped as (
        SELECT
            *
        FROM
            (
                SELECT
                    id,
                    timestamp,
                    field,
                    COUNT(field) OVER (PARTITION BY id ORDER BY timestamp) as grouper
                FROM
                    init_data
            ) as grouped
    ),

    forward_filled as (
        select 
            id,
            timestamp,
            field,
            max(field) OVER (PARTITION BY id, grouper) as forward_filled
        from grouped
        order by id, timestamp
    ),

    most_recent as (
        select 
            distinct on (id) id,
            forward_filled
        from forward_filled
        order by id, timestamp desc
    )

    select
        *
    from most_recent
    

{%- endmacro -%}
    ```
1 Like