I have a scenario where I need to call a stored procedure from my BI tool (Power BI), so that I can filter my source data and THEN perform some complex transformations (that are not suitable for M).
My question is, how do I convert this pseudo code stored procedure (in Snowflake) into a DBT model?
create or replace procedure DATABASE_NAME.SCHEMA_NAME.SP_NAME( filter_variable integer)
language sql
as
$$
with source as ( select * from DATABASE_NAME.SCHEMA_NAME.SOURCE where ID = :filter_variable),
t0 as ( select ... < complicated transformation > from source ),
t1 as ( select ... < complicated transformation > from t0 ),
t2 as ( select ... < complicated transformation > from t1 ),
....
t17 as ( select ... < complicated transformation > from t16 ),
select * from t17
$$;
And that gets me most of the way, but I’m unclear about how to hook in the variable into the upstream dbt model and then connect up all the other downstream models in the usual dbt {{ ref(‘’) }} way.
Why cant you just create a model with project level variable ? You can store it inside the yaml file or call it from the command line. What is the use case ?
Not entirely sure what you mean but the stored procedure parameter is defined by the Power BI dashboard user: they select a drop down, the value is passed to the direct query of the datasource (Snowflake), the direct query is a stored procedure within Snowflake which first filters the source tables and then performs various transformations in sequence. This is required as the transformations are too expensive (cross product + history + event parsing operations) to be done without filtering the source tables.
If I’ve understood what you mean, I don’t think a project level variable is going to provide that level of flexibility to the BI user, would it? As it would be customised at dbt run time by Data Engineers instead of Business Analysts in the BI layer?
HTH, let me know if I’ve misunderstood. I’m sure I can encapsulate the entire pipeline in one macro, but I’d love to split the 17 transformations into separate DBT models but am not sure how, as the first one still needs to reference a stored procedure parameter which I guess isn’t available in a Snowflake materialised view.
So I’ve managed to get this stored procedure to work by using a macro to create it, and then run the macro as a post hook to a model.
It still is a monster CTE with a shed loads of sequential queries which I’d LOVE to separate out.
Am still stuck on how to introduce a filter at the start of a CTE and have it filter through a sequence of DBT models to an output, so if anyone knows a mechanism to do this I would be very grateful!
with source as ( select * from DATABASE_NAME.SCHEMA_NAME.SOURCE where ID = '{{source_filter}}')
to pass variable value to your model, you can execute:
dbt run -m <your model name> --vars '{"source_filter": "your source ID value"}'
I’m not quite sure about the data type of your input parameter (‘source_filter’) so you might need to play a bit with single & double quotes but hopefully it gives you an idea how to dbt-ify your sproc.