I want to store my tables as managed tables on an external location like adls or blob.
By default databricks stores managed tables in: dbfs:/user/hive/warehouse, it is possible to change this default location by creating the database / schema and definining the location.
This way all tables created in this database/schema will be managed tables and the data as well as the metadata will be deleted if a table is dropped. I can’t find a way how to add the location to the create schema statement in dbt.
The context of why I’m trying to do this
What I’ve already tried
Consulted dbt documentation
Question:
How can I add the location to the CREATE DATABASE / SCHEMA statement in dbt in such tables within this database / schema will become managed tables and data is stored on an external storage account?
I need to do the me as you described, but if I configure the “location_root” it create external tables.
I need the same as you described. Managed tables, but with a set schema location.
I try to use pre-hooks, but of course it happens after the schema is already created.
I’m afraid I will have to use terraform to manage the schema creation, it’s a complication I didn’t want to add to the project.
Add the following variable to your dbt_project.yml file and set the target location for each target specified in your profiles.yml file. The variable should contain a valid json structure and each key is a target in the profiles.yml file and the name should exactly match, the value of each key is the container and the storage account where the managed schema should store tables created within that schema. This way the tables become managed tables and you are able to configure on which external location data of these tables should be stored.
Maybe there are more sophisticated ways to accomplish for example by leveraging unity catalog capabilities but we don’t use unity catalog hence we implemented this solution.