What is best practices for using DBT and Git with multiple internal organizations.

When companies that have many internal organizations use dbt. Do they all work in the same repository so that they can keep all of the linage data. Or is there a way to work in separate repositories but still keep this lineage.

I would use each dbt run per project repository. Each repository you can define private and assign a group and roles individually. That way you can have data governance and assign product owner to each repository. This is kind of data mesh structure I guess. Also you can read about Data Vault concept.

I’d like to hear more about this question. Let’s suppose we have different teams taking care of following business areas: sales and payroll. Adding to scenario, we might want to share master_data between teams.

Now, let me expose my thouthgs:

  1. create 3 environments named: prod_sales, prod_payroll, and prod_master_data. Each team will be able to run and test on their environment. Additionaly, each environ can be linked to a different branch on Git;
  2. on database side, create 3 different schemas with read/write permission for each team. This will prevent someone on different team running/building something on someone else’s schema;
  3. on Git side we could have 3 different branches, but I don’t know how to effectively segregate team access without adding complexity on Git

More on git side: Having different branches won’t prevent someone creating objects related to payroll and merging onto sales branch, but having isolated repos do. As side effect, it makes difficult to share master_data between repos. I think that creating a rule on Git to only allow objecs on a given subfolder on each branch might work. Ex: only objects on subfolder ./sales would be commited on sales branch. Well, I’m not an expert on Git and not sure if this is possible.

Any thoughts ?

Hello,
I think all lineage data is kept together and easily accessible, providing a comprehensive view of data transformations and dependencies across the entire organization. Teams work in separate repositories but still maintain lineage data. This can be achieved by using dbt’s external dependencies feature, where each team’s repository includes dependencies on the models of other teams.

1 Like

Thanks I am going to look into that.