Hi. I’d love to understand some of the thinking in the DBT community about this.
With the recommended DBT workflow using a separate schema for each DBT developer (and potentially creating schemas dynamically to run checks on pull-requests), doesn’t this dramatically increase your data storage requirements?
I’m comparing this to say, Looker PDTs where only the PDT that a developer has changed in a branch are rebuilt (and duplicated). Otherwise Looker uses the production PDT when in development.
My understanding is that in DBT-land people mostly rebuild the whole schema in separate development schemas, regardless of what’s changed?
How do you manage the performance and storage implications of this?
If I you have a team of 5 DBT developers, are folks really keeping 5 duplicate data sets in their warehouses?
How is everyone dealing with that?
Is the solution just using snowflake where storage space matters less? How about for those of us on Redshift?
Perhaps it would help to talk about how this is handled at different data scales?
Thanks for the information and cool tools.