Saas multi-schema best practices

Hey @daniellang – welcome :wave:

A little late to the party here, but hopefully you’ll let me join.

My preferred approach looks like @warpraptor’s! We use stitch to replicate each database into a separate schema, then, in dbt, union them all together.

Here are some tips:
1. We define our raw data as sources.
A neat trick here is to use yaml anchors to reduce duplication of code:

version: 2

sources:
  - name: shopify_us
    tables: &shopify_tables
      - name: products
        columns:
          - name: id
            tests:
              - unique
              - not_null
      - name: orders
        columns:
          - name: id
            tests:
              - unique
              - not_null

  - name: shopify_uk
    tables: *shopify_tables

  - name: shopify_au
    tables: *shopify_tables

2. Use jinja in dbt to union the tables together
For our first version, we’ll often just union together the tables manually (using a little Jinja to help us out)

{%- set countries = ['us', 'uk', 'au'] -%}
with products as (
{% for country in countries %}
  select * from {{ source('shopify_', ~ country, 'products') }}
  {% if not loop.last %} union all {% endif %}
{% endfor %)
),
...

Then, inevitably, there will be some table that has the columns in a different order. So we’ll use the union_relations macro from dbt utils here to help us out:

{%- set countries = ['us', 'uk', 'au'] -%}

{%- set shopify_product_relations=[] -%}

{% for country in countries %}
    {% do shopify_product_relations.append(source('shopify_' ~ country, 'products')) %}
{% endfor %}

with products as (
    {{ dbt_utils.union_relations(shopify_product_relations) }}
),
...

As the number of tables we need to union grows, we end up repeating the first 6 lines of this across multiple models. So we might then abstract it into a macro. Here’s an example:

{% macro union_shopify_sources(table_name) %}

    {%- set countries=['us', 'uk', 'au'] -%}

    {%- set shopify_relations=[] -%}

    {% for country in countries %}
        {% do shopify_relations.append(source('shopify_' ~ country, table_name)) %}
    {% endfor %}


    {{ dbt_utils.union_relations(shopify_relations) }}

{% endmacro %}

Then our downstream model looks like:

with products as (
    {{ union_shopify_sources('products') }}
),
...

Even though we’ve reduced the repetition in our code, there’s a tradeoff: we’ve obfuscated this code a little – it’s harder to understand exactly what’s happening without going to another code location.

3. Prefix unique IDs with country codes
My last tip for this kind of work is that I generally create a new unique ID for the unioned data by prefixing the old ID with a country code, e.g. US-187, AU-87546. I find that this is the most user-friendly version of the unique ID, and it allows me to write clean joins downstream (since I also do this for foreign keys). In one project, I even went so far as to macro-fy this logic in my union_shopify_sources, so that any field named id or ending with _id automatically had the country code prefixed.

2 Likes