Using Pre-Hooks to Define Table DDL

The problem I’m having

Hello,

I’m working with dbt core in databricks.

I’m trying to test the ability to define the DDL for a table using a pre-hook. So I’ve created a macro called “create_my_table.sql” that contains the DDL statement “CREATE TABLE IF NOT EXISTS ( some ddl…)”. I have this in the config block of my snapshot. I made sure to include the metadata columns that dbt needs to maintain the snapshot ( dbt_scd_id STRING, dbt_updated_at, dbt_valid_from,dbt_valid_to) and they are the correct data types. So I call the macro in the config block of the model with pre_hook = [“{{ create_my_table() }}”]. This appears to work. The big problem is that immediately after it works, the snapshot runs and instead of doing a describe history, like it would on subsequent runs when the table already exists, it does a “CREATE OR REPLACE TABLE” overwriting the table I just created with my pre-hook. The weird thing is, if i take that same DDL and execute it outside of the Pre-hook directly in databricks (and then remove the pre-hook call to the macro) the snapshot runs as it normally would if the table existed with a DESCRIBE HISTORY EXTENDED and does not drop the existing table. Anyone know why this behavior is occurring or what I am missing?

Thanks!

Welcome!

What you’re trying to do is an anti-pattern in dbt. Normally we let dbt create all of the tables including snapshots.

Can I ask why you’re trying to do what you’re trying to do? I honestly can’t think of it a reason. So maybe instead of trying to answer the question that you asked, we try and answer a different question?

Hi and thanks for the reply! I understand that it’s best practice and the intended design to write your select statements and let dbt create the tables, and that is what we are doing in almost all cases. The question is to understand if it’s possible (from my understanding it is) and how to do it properly. This is more of an experiment and an exception to the rule. Here are a few reasons for having the table exist and not letting dbt create it (I’m sure there are more).

  1. The table needs to exist first because the snapshot is referencing itself in subquery. Probably not the best approach, and this query was written by someone else on the team, but still let’s disregard whether it’s right or wrong for the moment, the question is whether it’s possible. :smile:
  2. We want to use the built in databricks identity type as a surrogate key which is not supported by dbt table creation. We do use dbt_utils.generate_surrogate_key in many places already and it works well, but this is a request for me to investigate.
  3. Any other case where we have some rigid/custom ddl that is not being satisfied by dbt or we want to use an identity column.

I hope this helps to clarify my intentions.

Thanks!

Hi @MegaMane , That does help a lot.

Unfortunately, I’m going to avoid directly answering the question at hand because I’d need to experiment with it. I have encountered strange situations where variables inside the config block evaluate differently than I would expect them to. I think this has to do with timing of when the config block is evaluated.

When I read your original post, I it reminded me of that timing issue I encountered in the past. dbt already evaluated “snapshot does not exist” at some point, then the pre-hook runs. I could be wrong, but as I said, I would have to play with it.

I will offer you another idea instead.

You can set a variable called table_exists. Then in the SQL you can do this pseudo code: if table_exists then do the self reference query thing, else do a different CTE that doesn’t self reference… end if.

You can set the table_exists variable by querying the information schema, or do what Claire did here ((use adapter.get_relation):

Hopefully this helps in some small way and sorry for not answering the original question!