Our group has more than 400 source schemas that we use for various data downstream into snowflake. We’re in the process of migrating to dbt and are trying to think through how to structure the project at this scale. One question came up with how to handle maintaining schema-specific definitions in the project config file.
Do I have to make an entry for every schema that we have? Is there a dynamic way of doing this or a way to not bloat the config file?
Curious to see how others have solved this problem.
At our company, we have defined what we call a
client_schema table. We also deal with 1000s of schemas(the number is growing exponentially). Every time a new client comes in, we store their data in a specific schema and create a record in that
client_schema table containing the client name, a client unique identifier(client_id) and the relevant schema for the client, through the backend. If this is a solution you are interested in, we can discuss how we do it using Snowflake external functions, AWS Lambda and AWS API.
Yes! I’d love to hear how you’ve set this up. I haven’t used snowflake external functions yet so this could be a great use case.