What I’m trying to do
I’m trying to maintain a metadata table in BigQuery containing a count of inserted rows when an incremental model is run.
The context of why I’m trying to do this
I’m ingesting data from Pub/Sub into a BigQuery table partitioned on on the Pub/Sub publish_time metadata. The data for each message is stored in a JSON blob. I want to then process the messages into another table partitioned by the date field within the messages themselves. I want to make sure that all messages in the raw table have been processed by maintaining counts of how many messages have been processed from each partition and comparing them to the total number of messages currently in each partition.
What I’ve already tried
This illustrates what I’m trying to do:
post-hook: |
INSERT INTO raw_dataset.processed_partitions (table, row_count, partition, query_timestamp)
VALUES('table_name', @@row_count, partition, CURRENT_TIMESTAMP())
Unfortunately, the @@row_count from the model run isn’t available in the post-hook, so it always comes back null.
In a bigquery script I could just run the merge statement and then run another INSERT INTO into the processed partitions table using the @@row_count variable.
Is something like this possible to replicate in dbt?