Strategy for dbt modeling when using Looker

ki_peek · May 24, 2021, 11:52pm

My company started using Looker a couple years ago, and currently are using LookML PDTs for transformations. As our project has grown more complex, and we have had interest from finance team to directly query our data warehouse, it has become clear that we should have a dedicated modeling layer outside of Looker, and DBT seems to be great option. It seems like DBT and Looker are used together frequently, but have not found many detailed examples of strategies to best set up dbt models for use with looker. I found the post on how DBT structures their projects here, but looking for some more insight on how would work specifically with Looker.

For concrete example, lets say my company name is “flatcloud” and we have a “transactions” PDT in Looker that is based on an internal transactions table (extracted and loaded into bigquery through stitch), and a “saleforce_account” view (not a PDT) based on a table in bigquery. The “transactions” PDT does some transformation of the foreign keys, including adding a salesforce_account_id. Then in Looker we have a “flatcloud_transactions” explore that joins in the salesforce_account view. In this situation, when modeling in DBT, should it look somewhat like below?

├── dbt_project.yml
└── models
    ├── marts
    |    ├── salesforce_accounts.sql
    |    └── flatcloud_transactions.sql
    |    └── schema.yml
    └── staging
        └── salesforce
        |    └─ base
        |    |    ├── schema.yml
        |    |    └── base_salesforce_accounts.sql
        |    └── source.yml
        └─ flatcloud
             └─ base
             |    ├── schema.yml
             |    └── base_flatcloud_transactions.sql
             ├── source.yml
             ├── schema.yml
             └── stg_flatcloud_transactions.sql

Questions:

so each view in LookML view should be based on model in marts directory?
I see many people designating models as fact or dimension tables, i have high level understanding of fact/dimension modeling, but not currently designated as such in our LookML modeling, is this highly recommended and how would that work with LookML in this example?
because we are not doing any transformations on salesforce account (just field renaming), is it fine to not have stg_salesforce_account model?
Do we really need stg_flatcloud_transactions model or can we just do the transformation of foreign keys in the flatcloud_transactions model?

Any insight would be very helpful!

andrea.fabry · June 3, 2021, 7:44pm

I have not found a lot of great resources on the dbt to Looker handoff, but here are some of the resources I’ve found and some of the things I learned.

If your Looker project is set up in a way that your end users find intuitive, easy-to-use, and your explores are performant, then the Looker to dbt switch is fairly easy. The SQL part of your PDTs can go into the models/marts folder in your dbt project. Anything that was a view in LookML that fed your PDTs need to be come a source in dbt. That is a two-step process:

you add them to your source.yml
then you stage them individually as stg_{source_name}__{table_name}

For these steps, you can use the codegen package; dbt (data build tool) - Explore Packages

Then you can keep your explores and dashboards and looks intact, replace the PDTs’

derived_table: {
    sql: 
}

argument with just sql_table_name: + the dbt table you built, and you are good to go.

Returning to your concrete example above, if you have an explore joining flatcloud_transactions (a PDT) to salesforce_accounts (a view), you would have two folders in staging, one for flatcloud and one for salesforce. You would create a source.yml in each (src_flatcoud.yml and src_salesforce.yml). You would put every flatcloud table that you are using in your transactions PDT into a separate stg_flatcloud__{tablename} table, and salesforce accounts would go into the stg_salesforce__accounts. You would then make the PDT and the accounts in marts, as you did above. Typically, people preface these with fct_flatcloud__transactions and dim_salesforce__accounts following the star schema but you don’t have to. You also don’t have to use “staging” and “marts” (in my company, we use bronze, silver and gold for different phases of transformations).

If your LookML, however, is not set up well (my use case), then you have a lot of refactoring ahead. The two most useful resources I’ve found on how to design the boundary between dbt and Looker are these two blog posts by Tristan Handy (links below).

I believe that it is a good idea to follow some sort of design pattern if you are refactoring your data model. However, many of the design patterns (e.g. Kimball dimensional modeling or Inmon) are outdated and need to be updated for modern warehouses and use cases. 9A great example for this is the dim_dates dimension table from Kimball - once upon a time, that may have been necessary, but now you can just generate daily, weekly, and monthly aggregations in Looker from your fact table.) I still follow Kimball and the star schema and make fact and dimension tables, but with adjustments for Looker. Happy to share notes on this!

andrea.fabry · June 3, 2021, 7:44pm

Here are the blogposts:

Topic		Replies	Views
Looker seems to be preferred BI tool? Archive	2	3139	July 16, 2020
Faster data modeling with Looker and dbt (Webinar Q&A with Drizly) Archive	0	3404	October 5, 2020
dbt tries to materilize downstream models incorrectly as specifies in dbt_project.yml Help dbt-core	0	787	December 4, 2023
DBT schemas and storage costs Archive	3	3917	January 23, 2020
Looker user attributes and `if dev` to auto-switch between dev & prod schemas Show and Tell environments , looker	11	9611	November 8, 2020

Strategy for dbt modeling when using Looker

Related topics