The problem I’m having
The dbt tests are failing on a partitioned table. When I check the table through BQ console it returns 0 but the results of the dbt tests results say that the test fail, unable to meet the criteria !=0
10 of 12 ERROR not_null_with_partition_raw_table__meta_use_case___meta_inserted_at
Got 1 result, configured to fail if != 0
The context of why I’m trying to do this
To check whether the fields have any nulls
What I’ve already tried
I created a test script not_null_with_partition under the macros folder and referenced it in the schema.yml
Some example code or error messages
not_null_with_partition.sql (under macros) -
{% set airflow_info = fromjson(var('airflow_info', '{}')) %}
{% test not_null_with_partition(model, column_name, partition_column) %}
with invalid_records as (
select *
from {{ model }}
where {{ column_name }} is null
and {{ partition_column }} >= get_max_meta_inserted_at( {{ model }}, airflow_info)
)
select count(*)
from invalid_records
{% endtest %}
schema.yml (under models) -
version: 2
models:
- name: raw_table
description: something
config:
contract:
enforced: true
columns:
- name: id
data_type: STRING
description: The unique identifier for each record - generate_uuid()
- name: _meta_use_case
data_type: STRING
description: The use case of the data
data_tests:
- not_null_with_partition:
partition_column: _meta_inserted_at
- name: _meta_source_file
data_type: STRING
description: The source file of the data
data_tests:
- not_null_with_partition:
partition_column: _meta_inserted_at
- name: _meta_run_id
data_type: STRING
description: Airflow job run ID
data_tests:
- not_null_with_partition:
partition_column: _meta_inserted_at
- name: _meta_inserted_at
data_type: TIMESTAMP
description: CURRENT_TIMESTAMP() - when the record was inserted into the table
data_tests:
- not_null_with_partition:
partition_column: _meta_inserted_at
max_inserted_at (under macros) -
{% macro get_max_meta_inserted_at(table_ref, airflow_info) %}
{% set query %}
select max(_meta_inserted_at)
from {{ table_ref }}
where _meta_inserted_at >= safe_cast('{{ airflow_info["ingested_at"] }}' as timestamp)
{% endset %}
{% set max_meta_inserted_at = dbt_utils.get_single_value(query) %}
{{ return("'" ~ max_meta_inserted_at ~ "'") }}
{% endmacro %}
Checking _meta_use_case for nulls in BQ console -
with invalid_records as (
select *
from `project.dataset.raw_table`
where _meta_use_case is null
and _meta_inserted_at >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY) -- to be replaced with output from get_max_meta_inserted_at
)
select count(*)
from invalid_records