I am currently utilizing an on-the-fly data transformation that is based on a webrequest_id ID. To put it simply, I’m employing a Table-Valued Function (TVF), which looks something like this:
select * from get_rank_for_web_request(webrequest_id);
The body of the function comprises over 200 distinct transformation logic operations. These are primarily responsible for performing ranking calculations. The function appears somewhat as follows:
create or replace function get_rank_for_web_request(webrequest_id string)
RETURNS TABLE(
webrequest_id string,
rank int,
rankonrock int,
scanbackrank decimal(12, 4)
) return
with get_request as (
select * from request_table where request_id = webrequest_id
)
get_rank_for_reuest (
select * from rank_table where request_id = webrequest_id
)
additional 200 cte
,
select * from rank;
now looking to refactor this code using dbt , with each (CTE) serving as a ephemeral model. However, I’m unsure how to dynamically filter the CTEs, how to create the function, and how to effectively reproduce the same functionality as the Table-Valued Function (TVF).
Go with ephemeral models if you have a reusable logic.
If this logic is only specific to this model then i advice you write the logic in a single model and use var function to dynamically filter in ctes
with get_request as (
select * from request_table where request_id = {{var("webrequest_id"}}
)
get_rank_for_reuest (
select * from rank_table where request_id = {{var("webrequest_id"}}
)
additional 200 cte
,
select * from rank;
Use --vars argument with dbt run command to pass webrequest_id
dbt run model_name --vars 'webrequest_id: 1234566'
You can try creating a macro to replace the function and use it inside a CTE, then query against the CTE
something like
-- my macro file
{% macro get_rank_for_web_request(webrequest_id) %}
with get_request as (
select * from request_table where request_id = {{ webrequest_id }}
)
get_rank_for_reuest (
select * from rank_table where request_id = {{ webrequest_id }}
)
additional 200 cte
,
final as (select * from rank)
{% endmacro %}
Then build your model like
-- my model file
{{ get_rank_for_web_request(webrequest_id) }}
select * from final
I didn’t quite understand the problem above, your function can be converted to a macro that will generate the SQL code based on your webrequest_id, then you can query against the final CTE generated by the macro