Using this as a spot to jot down some articles that I think would be useful for community members as I see discussions come up in Slack. Feel free to add some more ideas in the comments.
If you feel like you have good things to say on one of these topics, I encourage you to write the article! To help get you started, check out our tips here, and feel free to DM me on Slack if you want to work on it together.
How to analyze your dbt project performance
We see this question a lot, and the current solution (event-logging package) is not something we feel good about.
Running one dbt project N times (e.g. once for each customer)
Some companies have the source data with the same structure for different customers, and rather than union the data together, they want to run the dbt project separately for each customer (e.g. because they are presenting this data back to their customer, and don’t want other customers’ data to be surfaced.
There’s not a straightforward way to do this since dbt supports a one model == one output table, but I suspect some companies have come up with reasonably good solutions!
Handling PII in your transformation layer
We need someone to write the bible on this! What are different strategies companies have used to handle this? Are there good opinions about obfuscating at the extract-load level, or doing it in the transformation layer?
Migrating an existing data pipeline to dbt
Assignee: some of my colleagues
How do you break up a migration project? Should you refactor as you go? Are there any good tips to making this process easier?
Integrating python processes with dbt
Assignee: me (but also happy to hand this to someone else if you have good ideas here!)
dbt doesn’t run python. But every so often, there’s a transformation that is better suited for python than SQL. How can you handle this in the most dbtonic way?