I’m heavily backed by Java backend development, so have everything as object in my head. That is where SQL hurts me. Take it further and dbt is a mix of SQL + YML files. I know there are couple attempts to recreate SQL in OOP way, but I feel them going too fast and too far. For docs in yml there even build-in approach of reusing existing doc strings, but it is living seperatly from fields. SQL is great, could it little bit better in dbt? What do I mean?
Let’s say I have several datasources streaming transactions. In each source models I have to repeat almost the same set of fields (transaction_id and pack of other ids, net_amount, gross_amount and so on) + have the same yml files with docs and tests, because they are about transaction. It would be cool to have define one classtransaction
with docs and use or fields from that object over and over.
In a very basic example, it could be something like Jinja macros:
{% macro transaction_fields() %}
{
"amount": {
"name": "amount",
"doc": "The amount of the transaction"
},
"date": {
"name": "transaction_date",
"doc": "The date of the transaction"
}
}
{% endmacro %}
And use it over dbt code:
-- load macros in context
with source_data as (
select
{{ amount.name }} as {{ amount.name }},
{{ date.name }} as {{ date.name }}
from {{ ref('source_table') }}
)
select
{{ amount.name }},
{{ date.name }}
from source_data
and in yml:
version: 2
models:
- name: my_model
columns:
- name: {{ amount.name }}
description: {{ amount.doc }}
- name: {{ date.name }}
description: {{ date.doc }}
I believe you could make that solution runnable. But could be better? I’m thinking about preprocessing dbt files with custom Python script that will use some class definitions, to unwrap them in SQL code in particular names and doc strings. Not sure how implment preprocessing inside dbt with custom code. And definilty there would be some drawbacks for that solution. Please, provide ideas and your thoughts. I hope my idea has some sense at least and not total crap.