Splitting arrays into rows in Redshift with dbt macros

Some of our data is stored like this:

id tags
1 a
2 a, b
3 c , d
4 a,b,e
5 NULL

When really we want it like this:

id tag
1 a
2 a
2 b
3 c
3 d
4 a
4 b

In postgres, you can use the UNNEST function, however, there is no equivalent in Redshift.

I took inspiration from this blogpost, and combined it with the power of dbt to create a macro that unnests lists!

I also created a macro that unnests json arrays.

Check out this gist to see the code for the macros, along with some examples (I’ve kept it as a gist since there’s multiple files).

4 Likes

I just wrote a whole blog post on unnesting arrays in Redshift after having to do this for a client project. It walks through the entire process, including all of the functions required. Hopefully attempts to build some intuitions as well as just sharing the code.

I actually specifically chose not to macro-ize this code. In my specific situation I only had to do this once within an entire project and hopefully you don’t have excessive use of semi-structured data in Redshift either. Given that, i found that the complexity of the macro version would have actually been greater than simply writing the SQL to do the thing. Obviously, dbt-utils’ generate_series is still super-useful :slight_smile:

Hopefully a good resource for folks who have to do this for the first time.

1 Like