Transform and divide a table using many threads

aris9m · March 8, 2022, 1:28pm

Hello all,

I am new in dbt and I would like to ask you a question to understand if this tool fits my case.

Let’s say that I have a table called ‘sales’ that contains 5 columns (‘day’, ‘item’, ‘category’, ‘store’, ‘sales’). For each day-item-category-store combination, I want to calculate the moving average of the last 7 days.

As for the ‘category’ column, I have 8 unique values, and the rows of the table are proportionately split within these 8 categories. Moreover, between the ‘category’ and ‘item’ columns, there is a hierarchical relationship, meaning that all the rows that refer to a specific item are included in the same category.

At the same time, I have 8 threads available on my machine.

Is there any smart way to execute this calculation using all the available threads, one thread for each category, without writing 8 separate models (SQL files)?

Ideally, I would like the results to be split into 8 new tables, one for each category e.g. moving_avg_cat1, moving_avg_cat2, moving_avg_cat3, etc.

Thanks in advance.

shane · March 10, 2022, 3:26am

This is a great question - a lot of databases don’t multi thread per client but can multi-process with multiple client connections. So if DBT could support auto-sharding of tables (as a materialization) it would improve processing of large data sets immensely.

I’ve built a materialisation to do this (based on insert_by_period_materialization.sql from dbt_utils) but alas, I can’t get it to use a thread per chunk of data (the partitions all run in the same thread )

I can do it manually by creating a model per “shard” and then making a model that “gathers” them all with a view that unions them all, but it’s really ugly.

So I’m also interested in the answer to this question. (Or if it’s possible to spawn threads in macros )

Topic		Replies	Views
implementing threads dynamically Help	3	1199	July 26, 2023
Creating many-to-many tables from dbt Archive	1	2815	April 4, 2022
Can DBT create Partitioned tables? Help dbt-core	0	203	January 31, 2025
Breaking a large query into parallelized partitioned queries In-Depth Discussions geospatial , python-models	6	6473	September 15, 2022
dbt recreates the whole table? Help	0	118	July 24, 2024

Transform and divide a table using many threads

Related topics