Hey there! Welcome to dbt!
Based on the original post, here are a couple assumptions Iām making and an example approach below.
Assumptions:
-
blah
is available to dbt as a source
- Weāre only interested in two columns (
column_name
, other_column
)
With those assumptions, Iām thinking some Jinja is all youād need to clean up the model.
Example:
This model, stg_foobar
, basically replaces your create table
by getting everything from blah
loaded into foobar
. You can schedule dbt to run this model as often as you need to make sure foobar
has all the data needed from blah
, assuming there is something in the omitted code that results in foobar
not just being a copy of blah
.
stg_foobar.sql
select * from {{ source(āblah_containerā, āblah_tableā) }}
I imagine this next model, categorized_foobar
, as a view, but it could be a table. The point here is that you donāt need to go back and constantly update any tables. Rather, you create this model to build on top of stg_foobar
so that anytime data is read from categorized_foobar
, itās automatically using the evaluated columns.
The Jinja mappings
is a variable you can use to cleanly define your mapping rules in a single place and prevent yourself from having to write a lengthy case statement.
categorized_foobar.sql
{% set mappings = {'something': 'boo', 'something-else': 'boo-else'} %}
with source as (
select * from {{ ref(āstg_foobarā) }}
),
final as (
select
case
{% for old, new in mappings %}
when other_column like ā{{old}}ā then ā{{new}}ā
{% endfor %}
end as column_name
from
source
)
select * from final
Of course, if the complexity of the string matching is the crux of the problem here (that doesnāt stand out to me in the original question), we might look at another approach.
P.S. Iām drafting this up without access to run/test it, so you may run into some syntax issues, but I think conceptually it should work.