Involve little bit object oriented approach in DBT. Is that bad idea?

sphinksone · July 22, 2024, 8:24pm

I’m heavily backed by Java backend development, so have everything as object in my head. That is where SQL hurts me. Take it further and dbt is a mix of SQL + YML files. I know there are couple attempts to recreate SQL in OOP way, but I feel them going too fast and too far. For docs in yml there even build-in approach of reusing existing doc strings, but it is living seperatly from fields. SQL is great, could it little bit better in dbt? What do I mean?

Let’s say I have several datasources streaming transactions. In each source models I have to repeat almost the same set of fields (transaction_id and pack of other ids, net_amount, gross_amount and so on) + have the same yml files with docs and tests, because they are about transaction. It would be cool to have define one classtransaction with docs and use or fields from that object over and over.

In a very basic example, it could be something like Jinja macros:

{% macro transaction_fields() %}
{
    "amount": {
        "name": "amount",
        "doc": "The amount of the transaction"
    },
    "date": {
        "name": "transaction_date",
        "doc": "The date of the transaction"
    }
}
{% endmacro %}

And use it over dbt code:

-- load macros in context

with source_data as (
    select
        {{ amount.name }} as {{ amount.name }},
        {{ date.name }} as {{ date.name }}
    from {{ ref('source_table') }}
)

select
    {{ amount.name }},
    {{ date.name }}
from source_data

and in yml:

version: 2

models:
  - name: my_model
    columns:
      - name: {{ amount.name }}
        description: {{ amount.doc }}
      - name: {{ date.name }}
        description: {{ date.doc }}

I believe you could make that solution runnable. But could be better? I’m thinking about preprocessing dbt files with custom Python script that will use some class definitions, to unwrap them in SQL code in particular names and doc strings. Not sure how implment preprocessing inside dbt with custom code. And definilty there would be some drawbacks for that solution. Please, provide ideas and your thoughts. I hope my idea has some sense at least and not total crap.

brunoszdl · July 23, 2024, 12:16pm

What about using variables?

sphinksone · July 23, 2024, 3:49pm

Not sure if I followed the idea of using variables? Is it regular way to use? Define var in project.yml and use it later in models?
If that is the case, I would say it will make project file huge. Is there a way to import variables from separate files?

Another issue with var, it make pretty long way to “call” var like {{ var('var_name') }}, so I’m looking for the way to reduce amount of typed symbols to call.

brunoszdl · July 23, 2024, 7:11pm

Got it, yeah, so your approach makes sense

You can create a macro that returns a dictionary

Then in your model you can make something like

{% set my_dict = my_macro() %}

And access the dict values

Topic		Replies	Views
dbt Python model (dbt-py) best practices In-Depth Discussions best-practice , python-models	1	13168	January 19, 2023
scalable yaml code generation In-Depth Discussions	1	1716	February 27, 2023
Best Practices for Managing SQL Dialects in DBT Across Different Databases In-Depth Discussions	0	87	September 25, 2024
dbt support for schema-versioned database objects? Help snapshots , best-practice , custom-schema	1	1522	January 17, 2023
Here is a way to write dbt docs as SQL comment Show and Tell dbt-docs	1	7340	September 27, 2020

Involve little bit object oriented approach in DBT. Is that bad idea?

Related topics