Using Pre-Hooks to Define Table DDL

MegaMane · January 12, 2024, 10:59pm

The problem I’m having

Hello,

I’m working with dbt core in databricks.

I’m trying to test the ability to define the DDL for a table using a pre-hook. So I’ve created a macro called “create_my_table.sql” that contains the DDL statement “CREATE TABLE IF NOT EXISTS ( some ddl…)”. I have this in the config block of my snapshot. I made sure to include the metadata columns that dbt needs to maintain the snapshot ( dbt_scd_id STRING, dbt_updated_at, dbt_valid_from,dbt_valid_to) and they are the correct data types. So I call the macro in the config block of the model with pre_hook = [“{{ create_my_table() }}”]. This appears to work. The big problem is that immediately after it works, the snapshot runs and instead of doing a describe history, like it would on subsequent runs when the table already exists, it does a “CREATE OR REPLACE TABLE” overwriting the table I just created with my pre-hook. The weird thing is, if i take that same DDL and execute it outside of the Pre-hook directly in databricks (and then remove the pre-hook call to the macro) the snapshot runs as it normally would if the table existed with a DESCRIBE HISTORY EXTENDED and does not drop the existing table. Anyone know why this behavior is occurring or what I am missing?

Thanks!

jeffsk · January 14, 2024, 1:15pm

Welcome!

What you’re trying to do is an anti-pattern in dbt. Normally we let dbt create all of the tables including snapshots.

Can I ask why you’re trying to do what you’re trying to do? I honestly can’t think of it a reason. So maybe instead of trying to answer the question that you asked, we try and answer a different question?

MegaMane · January 15, 2024, 3:56pm

Hi and thanks for the reply! I understand that it’s best practice and the intended design to write your select statements and let dbt create the tables, and that is what we are doing in almost all cases. The question is to understand if it’s possible (from my understanding it is) and how to do it properly. This is more of an experiment and an exception to the rule. Here are a few reasons for having the table exist and not letting dbt create it (I’m sure there are more).

The table needs to exist first because the snapshot is referencing itself in subquery. Probably not the best approach, and this query was written by someone else on the team, but still let’s disregard whether it’s right or wrong for the moment, the question is whether it’s possible.
We want to use the built in databricks identity type as a surrogate key which is not supported by dbt table creation. We do use dbt_utils.generate_surrogate_key in many places already and it works well, but this is a request for me to investigate.
Any other case where we have some rigid/custom ddl that is not being satisfied by dbt or we want to use an identity column.

I hope this helps to clarify my intentions.

Thanks!

jeffsk · January 15, 2024, 5:54pm

Hi @MegaMane , That does help a lot.

Unfortunately, I’m going to avoid directly answering the question at hand because I’d need to experiment with it. I have encountered strange situations where variables inside the config block evaluate differently than I would expect them to. I think this has to do with timing of when the config block is evaluated.

When I read your original post, I it reminded me of that timing issue I encountered in the past. dbt already evaluated “snapshot does not exist” at some point, then the pre-hook runs. I could be wrong, but as I said, I would have to play with it.

I will offer you another idea instead.

You can set a variable called table_exists. Then in the SQL you can do this pseudo code: if table_exists then do the self reference query thing, else do a different CTE that doesn’t self reference… end if.

You can set the table_exists variable by querying the information schema, or do what Claire did here ((use adapter.get_relation):

Hopefully this helps in some small way and sorry for not answering the original question!

Topic		Replies	Views
Create table via ddl in dbt without using pre-post hook In-Depth Discussions jinja , best-practice , snowflake , macros , dbt-core	0	552	June 19, 2024
Questions on pre-hook Help	2	1461	April 18, 2023
Post Hook conditioned on DDL changes Help dbt-core	0	1159	October 27, 2023
Macro triggering order Archive	0	1922	September 16, 2021
Update a column if table exists Archive	5	10490	May 25, 2022

Using Pre-Hooks to Define Table DDL

The problem I’m having

Related topics