Transform and divide a table using many threads

Hello all,

I am new in dbt and I would like to ask you a question to understand if this tool fits my case.

Let’s say that I have a table called ‘sales’ that contains 5 columns (‘day’, ‘item’, ‘category’, ‘store’, ‘sales’). For each day-item-category-store combination, I want to calculate the moving average of the last 7 days.

As for the ‘category’ column, I have 8 unique values, and the rows of the table are proportionately split within these 8 categories. Moreover, between the ‘category’ and ‘item’ columns, there is a hierarchical relationship, meaning that all the rows that refer to a specific item are included in the same category.

At the same time, I have 8 threads available on my machine.

Is there any smart way to execute this calculation using all the available threads, one thread for each category, without writing 8 separate models (SQL files)?

Ideally, I would like the results to be split into 8 new tables, one for each category e.g. moving_avg_cat1, moving_avg_cat2, moving_avg_cat3, etc.

Thanks in advance.

This is a great question - a lot of databases don’t multi thread per client but can multi-process with multiple client connections. So if DBT could support auto-sharding of tables (as a materialization) it would improve processing of large data sets immensely.

I’ve built a materialisation to do this (based on insert_by_period_materialization.sql from dbt_utils) but alas, I can’t get it to use a thread per chunk of data (the partitions all run in the same thread :frowning: )

I can do it manually by creating a model per “shard” and then making a model that “gathers” them all with a view that unions them all, but it’s really ugly.

So I’m also interested in the answer to this question. (Or if it’s possible to spawn threads in macros :joy: )