Add location to create database/schema statement in databricks to enable creation of managed tables on external storage accounts

The problem I’m having

I want to store my tables as managed tables on an external location like adls or blob.
By default databricks stores managed tables in: dbfs:/user/hive/warehouse, it is possible to change this default location by creating the database / schema and definining the location.

Example SQL statement:
CREATE DATABASE test_db LOCATION ‘abfss://testdata@storage-account.dfs.core.windows.net/test_db’
or:
CREATE SCHEMA test_db LOCATION ‘abfss://testdata@storage-account.dfs.core.windows.net/test_db’

(database / schema are equivalent in databricks)

This way all tables created in this database/schema will be managed tables and the data as well as the metadata will be deleted if a table is dropped. I can’t find a way how to add the location to the create schema statement in dbt.

The context of why I’m trying to do this

What I’ve already tried

Consulted dbt documentation

Question:
How can I add the location to the CREATE DATABASE / SCHEMA statement in dbt in such tables within this database / schema will become managed tables and data is stored on an external storage account?

You can add the ‘location_root’ config parameter described in

This will allow you to set the default storage location. You might need to permanently mount the the azure storage container in a /mnt folder first.

Hi @mwielen

Did you solve this issue?

I need to do the me as you described, but if I configure the “location_root” it create external tables.
I need the same as you described. Managed tables, but with a set schema location.

I try to use pre-hooks, but of course it happens after the schema is already created.

I’m afraid I will have to use terraform to manage the schema creation, it’s a complication I didn’t want to add to the project.

@lud.bertier yes I managed to solve this challenge by overriding the ‘create_schema’ macro. See code in this file:

create_schema.sql (1.2 KB)

Add the following variable to your dbt_project.yml file and set the target location for each target specified in your profiles.yml file. The variable should contain a valid json structure and each key is a target in the profiles.yml file and the name should exactly match, the value of each key is the container and the storage account where the managed schema should store tables created within that schema. This way the tables become managed tables and you are able to configure on which external location data of these tables should be stored.

vars:
target_schema_locations: {
‘dev’: ‘abfss://container_name@storage_account_name.dfs.core.windows.net’,
‘acc’: ‘abfss://container_name@storage_account_name.dfs.core.windows.net’,
‘prod’: ‘abfss://container_name@storage_account_name.dfs.core.windows.net’,
}

Maybe there are more sophisticated ways to accomplish for example by leveraging unity catalog capabilities but we don’t use unity catalog hence we implemented this solution.

1 Like