Hello everyone!
Iโm excited to share cte2dbt
, a small yet powerful Python module designed to simplify the transition from large, monolithic SQL queries to modular, reusable dbt models. If youโve ever struggled with maintaining long SQL queries packed with Common Table Expressions (CTE), this tool is for you!
What does cte2dbt do?
It extracts CTE from a SQL query and converts them into structured dbt models, ensuring that dependencies are preserved and execution order is maintained. This makes your SQL more maintainable, testable, and reusable without the manual effort of breaking queries apart.
Why use cte2dbt?
Automates SQL-to-dbt migration โ No more manually splitting queries into models.
Preserves dependencies โ Ensures dbt models are executed in the correct order.
Enhances maintainability & testing โ Easier debugging, modularisation, and documentation.
Fully customisable โ Define how CTEs and source tables are processed.
Iterable design โ Use it programmatically to visualise, validate, or modify models dynamically.
Example Usage
import com.github.piotr_yuxuan.cte2dbt as cte2dbt
from sqlglot import parse_one
sql_query = """
WITH customers_cte AS (
SELECT id, name FROM customers
), orders_cte AS (
SELECT c.id, o.amount
FROM customers_cte AS c
JOIN prod.retails.orders AS o ON c.id = o.customer_id
)
SELECT * FROM orders_cte;
"""
provider = cte2dbt.Provider(
model_name="final_model",
expr=parse_one(sql_query),
to_dbt_ref_block=lambda name: f"{{{{ ref('{name}') }}}}",
to_dbt_source_block=lambda table: f"{{{{ source('{table.db}', '{table.name}') }}}}"
)
for model_name, model_expr in provider.iter_dbt_models():
print(f"-- Model: {model_name}\n{model_expr.sql(pretty=True)}")
Output:
-- Model: cte1
SELECT id, name FROM {{ source('my_source', 'customers') }}
-- Model: cte2
SELECT cte1.id, orders.amount
FROM {{ ref('cte1') }} AS cte1
JOIN {{ source('my_source', 'orders') }} AS orders ON cte1.id = orders.customer_id
-- Model: final_model
SELECT * FROM {{ ref('cte2') }} AS cte2
Installation
You can install it using Poetry or any other package manager, I just published it on PyPI:
poetry add cte2dbt
What else can you do with cte2dbt
?
Beyond just SQL-to-dbt conversion, cte2dbt
unlocks several use cases, such as:
Inspecting intermediate results: easily debug CTE by viewing each step and computing a similarity ratio with what a target, existing pipeline.
Generating dependency graphs: visualise how your models interact like dbt Cloud, but you can observe how similarity reduces from one noisy model to downstream models.
Executing transformations dynamically : apply the logic programmatically.
Exporting models as SQL files: automate model file generation for dbt.
Who is this for?
If youโre working with dbt and code from data analyst to production or migrating legacy pipelines do dbt, cte2dbt
can save time and improve code quality by enforcing a structured, modular approach to SQL development.
Iโd love to hear your thoughts and feedback! If you try it out, let me know how it works for you or if you have any suggestions for improvements.
Check out the GitHub repository: GitHub - piotr-yuxuan/cte2dbt: Tools with batteries included to extract CTE from a large SQL query and split them iteratively into dbt models.