I am trying to build a service to provide analytics to multiple customers. I am working in Snowflake using a multi-tenancy architecture in which the warehouse is shared. All of the customers have the exact same DB structure with the same Schemas and tables. The problem I’m having is finding out how to apply the exact same set of transformations to every customer DB.
You should be able to do this using a variable that would hold the database name and then supply it on the dbt command line. That would require one job per customer if you were going to create jobs via their UI, but it would use the same code.
However, you should be able to then call the dbt run/build using python in a loop that would set the database variable for each customer. Note - this is something I have been going to look into exactly how to do myself but have not done it yet so I can’t give you more specifics than the idea.
The problem I was/am facing is having a dynamic collection of databases in Snowflake. So I need a query to collect all the available databases I need to operate on that match a certain naming convention, and then run operations on a dynamic collection of schemas in each database.
Yeah, so you would have to be able to run that query and then call the dbt run in a loop from a python script and substitute the database names into a variable on each call. Again, I don’t know python very well, other than to know that I’m pretty sure that this idea would work functionally.
I actually just found an important note about snowflake. When running queries, you can only see things you have access to, makes sense. As it turns this means I may actually have a solution.
I can use this to get all the databases matching a certain pattern, the pattern of which I can define with jinja.
select database_name from snowflake.information_schema_databases where database_name like '<database_name';
Then I can do something similar to get all the schemas available within a certain database (or a specific schema) by modifying the following query to swap out the database and schema.
select * from <database_name>.information_schema.schemata where schema_name like '<schema_name>';