Hey there! Welcome to dbt!
Based on the original post, here are a couple assumptions I’m making and an example approach below.
Assumptions:
-
blah
is available to dbt as a source - We’re only interested in two columns (
column_name
,other_column
)
With those assumptions, I’m thinking some Jinja is all you’d need to clean up the model.
Example:
This model, stg_foobar
, basically replaces your create table
by getting everything from blah
loaded into foobar
. You can schedule dbt to run this model as often as you need to make sure foobar
has all the data needed from blah
, assuming there is something in the omitted code that results in foobar
not just being a copy of blah
.
stg_foobar.sql
select * from {{ source(‘blah_container’, ‘blah_table’) }}
I imagine this next model, categorized_foobar
, as a view, but it could be a table. The point here is that you don’t need to go back and constantly update any tables. Rather, you create this model to build on top of stg_foobar
so that anytime data is read from categorized_foobar
, it’s automatically using the evaluated columns.
The Jinja mappings
is a variable you can use to cleanly define your mapping rules in a single place and prevent yourself from having to write a lengthy case statement.
categorized_foobar.sql
{% set mappings = {'something': 'boo', 'something-else': 'boo-else'} %}
with source as (
select * from {{ ref(‘stg_foobar’) }}
),
final as (
select
case
{% for old, new in mappings %}
when other_column like ‘{{old}}’ then ‘{{new}}’
{% endfor %}
end as column_name
from
source
)
select * from final
Of course, if the complexity of the string matching is the crux of the problem here (that doesn’t stand out to me in the original question), we might look at another approach.
P.S. I’m drafting this up without access to run/test it, so you may run into some syntax issues, but I think conceptually it should work.