@jaypeedevlin asked and answered a great question on the dbt Slack:
Has anyone hacked together anything to prevent seeds beyond a certain size? We’d like to put some sort of barrier in place to stop folks using seeds for ingestion instead of just for mapping-type files.
As he notes, the dbt docs strongly advise against using seeds for data loading, because it’s not performant and you need to commit the changes to your repository every time.
It turns out that because you can override the behaviour of basically any macro in dbt, you can also block this from happening inside of your project. In this case, we need to override the load_csv_rows()
macro to include the below lines at the top of the macro:
--macros/snowflake__load_csv_rows.sql
{% macro snowflake__load_csv_rows(model, agate_table) %}
{# Check that the seed doesn't have too many values (rows * columns) #}
{% set MAX_SIZE = 60000 %}
{% set seed_size = agate_table.rows | length * agate_table.columns | length %}
{% if seed_size > MAX_SIZE %}
{{ exceptions.raise_compiler_error("{} values in seed exceeds maximum of {}.".format(seed_size, MAX_SIZE)) }}
{% endif %}
{# then the rest of the adapter's implementation of load_csv_rows here... #}
{% endmacro %}
Here’s the implementations of load_csv_rows for Snowflake and BigQuery. Other adapters’ implementations are either in similar spaces in their respective repos, or use the default implementation like Redshift and Postgres.