How to create indexes on post-hook for MS SQL Server target DB

If your target DB is MS SQL Server, and you are new to dbt, you will notice that dbt is designed with more modern architectures in mind (e.g. Snowflake, Google’s BigQuery, Amazon’s Redshift, etc) that abstract away the need manage indexes. But the dbt designers have given us so many tools to extend it! And there are some really smart developers in the community working to make dbt work for MS SQL Server.

If you are having to use dbt on a MS SQL Server target DB, then you may also need to manage your indexes. One DRY way of doing that is using dbt’s macros and post-hooks to create indexes at the model level. Here are some steps to make that happen.

Since the SQL flavor of MS SQL Server is T-SQL, then let’s start with the jaffle_shop example adapted for MS SQL Server.

Setup:

  1. Install SQL Server adapter: dbt-mssql or dbt-sqlserver
  2. Clone/download code from jaffle_shop_mssql repo
  3. Edit your profiles.yml to include a profile for jaffle_shop; connect with the correct dbt adapter for MS SQL Server
  4. Create macros subfolder in main jaffle_shop_mssql directory
  5. Add drop_all_indexes_on_table.sql and create_nonclustered_index.sql to macros subfolder; see below

Let’s do this:

  1. Edit the top of jaffle_shop_mssql/models/marts/core/fct_orders.sql by adding this config macro
{{ 
    config({
	  "pre-hook": "{{ drop_all_indexes_on_table() }}",
	  "post-hook": [
		"{{ create_nonclustered_index(columns = ['some_column']) }}",
		"{{ create_nonclustered_index(columns = ['some_column', 'another_column']) }}",
		"{{ create_nonclustered_index(columns = ['this_column', 'that_coloumn'], includes = ['my_column', 'your_column']) }}"
	  ]
    }) 
}}
  1. Go to command line
  2. Run dbt debug ← make sure you are connected to your SQL Server
  3. Run dbt seed ← the source files for the jaffle_shop_mssql example
  4. Run dbt run ← experience the glory of creating indexes!
  5. Verify in MS SQL Server Management Studio (SSMS) that indexes created as expected.

Big Thank Yous

The dbt community is awesome! And I got help with this code on the slack #support channel here.

This code will be added as macros to dbt-mssql adapter in the near future.

drop_all_indexes_on_table.sql

{% macro drop_all_indexes_on_table() %}
{# Altered from https://stackoverflow.com/q/1344401/10415173 #}

declare @drop_xml_indexes_first nvarchar(max);
select @drop_xml_indexes_first = ( 
		select 'DROP INDEX IF EXISTS [' + sys.indexes.[name] + '] ON ' + '[' + SCHEMA_NAME(sys.tables.[schema_id]) + '].[' + OBJECT_NAME(sys.tables.[object_id]) + ']; '
		from sys.indexes 
		inner join sys.tables on sys.indexes.object_id = sys.tables.object_id
		where sys.indexes.[name] is not null 
		  and sys.indexes.type_desc = 'XML'
		  and sys.tables.[name] = '{{ this.table }}'
		for xml path('')
	);
exec sp_executesql @drop_xml_indexes_first;

declare @drop_spatial_indexes_second nvarchar(max);
select @drop_spatial_indexes_second = ( 
		select 'DROP INDEX IF EXISTS [' + sys.indexes.[name] + '] ON ' + '[' + SCHEMA_NAME(sys.tables.[schema_id]) + '].[' + OBJECT_NAME(sys.tables.[object_id]) + ']; '
		from sys.indexes 
		inner join sys.tables on sys.indexes.object_id = sys.tables.object_id
		where sys.indexes.[name] is not null 
		  and sys.indexes.type_desc = 'Spatial'
		  and sys.tables.[name] = '{{ this.table }}'
		for xml path('')
	);
exec sp_executesql @drop_spatial_indexes_second;

declare @drop_fk_constraints_next nvarchar(max);
select @drop_fk_constraints_next = ( 
		select 'ALTER TABLE [' + SCHEMA_NAME(sys.foreign_keys.[schema_id]) + '].[' + OBJECT_NAME(sys.foreign_keys.[parent_object_id]) + '] DROP CONSTRAINT IF EXISTS [' + sys.foreign_keys.[name]+ '];'
		from sys.foreign_keys 
		inner join sys.tables on sys.foreign_keys.[referenced_object_id] = sys.tables.[object_id]
		where sys.tables.[name] = '{{ this.table }}'
		for xml path('') 
	); 
exec sp_executesql @drop_fk_constraints_next;

declare @drop_pk_constraints_next nvarchar(max);
select @drop_pk_constraints_next = ( 
		select 'ALTER TABLE [' + SCHEMA_NAME(sys.tables.[schema_id]) + '].[' + sys.tables.[name] + '] DROP CONSTRAINT IF EXISTS [' + sys.indexes.[name]+ '];'
		from sys.indexes 
		inner join sys.tables on sys.indexes.[object_id] = sys.tables.[object_id]
		where sys.indexes.is_primary_key = 1
		  and sys.tables.[name] = '{{ this.table }}'
		for xml path('') 
	);
exec sp_executesql @drop_pk_constraints_next;

declare @drop_remaining_indexes_last nvarchar(max);
select @drop_remaining_indexes_last = 
	(
		select 'DROP INDEX IF EXISTS [' + sys.indexes.[name] + '] ON ' + '[' + SCHEMA_NAME(sys.tables.[schema_id]) + '].[' + OBJECT_NAME(sys.tables.[object_id]) + ']; '
		from sys.indexes 
		inner join sys.tables on sys.indexes.object_id = sys.tables.object_id
		where sys.indexes.[name] is not null 
		  and sys.tables.[name] = '{{ this.table }}'
		for xml path('')
	);
exec sp_executesql @drop_remaining_indexes_last;

{% endmacro %}

create_nonclustered_index.sql

{% macro create_nonclustered_index(columns, includes) %}

{% if include_names is undefined %}

create nonclustered index 
  {{ this.table }}__index_on_{{ columns|join("_") }}
  on {{ this }} ({{ '[' + columns|join("], [") + ']' }})
 
{% else %}

create nonclustered index 
  {{ this.table }}__index_on_{{ columns|join("_") }}
  on {{ this }} ({{ '[' + columns|join("], [") + ']' }})
  include ({{ '[' + includes|join("], [") + ']' }})

{% endif %}

{% endmacro %}
2 Likes

Thank you so much for sharing this, bryan.

For anyone else who stumbles upon this from google: These macros have been integrated into the dbt-sqlserver adapter and you don’t need to create them manually.

I have also posted a version of these macros that checks if the index exists before creating a new one here: https://github.com/mikaelene/dbt-sqlserver/issues/25

This allows you to use indexes on incremental refresh tables without recreating them on each refresh.

3 Likes