The problem I’m having
Hi, in our team we handle a lot of data sources from external vendors, and we experience a lot of data issues like the end start is smaller than the start date. Sometimes it is slow to wait for externals to fix the data issues, so we need to implement a clean up strategy using dbt framework. Our purpose is to make this as scalable as possible, and is rubust and can be reused accross multiple models, with also logging of what has been changed based on what rules. E.g. if start_date > end_date, set both fields to null. There are more different types of rules to be anchored too.
Does anyone have experiences of implementing sth similar with dbt?
What I’ve already tried
My initial thought would be to have a rule config file, define conditions, and correct actions. Afterwards implement macros to create template of updating data according to the rules, and then use the macro wherever is needed in the models. Meanwhile adding a centralized log table to record what has been changed according to which rule. But I am new to dbt and I wonder if there is already sth in place as best practices that I can learn from.
Any suggestions are welcome. Thank you!