Best Practices on Deploying BI/ Reporting Tools

Does anyone have guidelines/ best practices / resources around “deployment” of reports via Mode/ Looker/ Periscope/ etc?

Mostly curious how teams are validating changes and testing, especially those that might be client-facing (white labeled, for instance). There is a sentiment that we shouldn’t expect analysts and other parties who might be editing the reports to follow a traditional dev lifecycle (which I get to some degree), especially considering the fast iterations of reports that are possible with these tools.

It seems like this is a problem across various vendors, and editing a report on the fly has a wide range of potential down-stream effects (bad queries, bad visualizations, etc, etc).

How are teams currently working to address this issue?

5 Likes

This is something we’ve thought a lot about and unfortunately no vendor that I know of currently supports the solution we want. (We here being the GitLab data team).

We’re heavily biased towards a git-based workflow. My ideal world would have the entire BI tool based around git branches with a production branch being the main one that people consume. Ideally, any edits would happen on a branch specific to a user (or a named feature) branch and once an editor is happy with the changes they can submit for their branch to be merged back into production. This would trigger a review process from our team where we would check for all the things you’d want to check when reviewing a dashboard.

We’re currently on Periscope and I believe something like this is on their roadmap. Mode has suggested they’d be looking at this sort of workflow as well. We used Looker previously and their git integration does allow for branching and merge requests to happen, but not for any of the actual dashboards. We felt this was pretty good though since as long as the explores were set up well then we would feel confident that folks wouldn’t get bad results.

Our current process now requires some education and training for people to submit issues to have their dashboard reviewed. (We have an issue template here.) It works well for us, but that’s partially because we already have a strong culture of code review at GitLab.

Hope this helps!

3 Likes

The separation between dev and prod environments was actually one of the main reasons we switched from Mode to Looker. Allowing each analyst to work on their own branch and confirming things don’t break prior to merging has helped us mitigate a lot of issues around broken reports. The current workflow looks something like:

  1. Build and test dbt models locally
  2. Change references to tables in Looker to point to your local schema
  3. Run the Content Validator in Looker to ensure that everything still works. This is great for catching other reports that depend on the table you just changed.
  4. Change references back to prod schema, and deploy to master

This workflow is still a little clunky, but it’s worked pretty well for us so far. There are ways to programatically manage the schema reference switching between dev and prod, outlined in another discourse post here, but that didn’t work too great for us.

We also recently added a script that regularly validates all our user-facing content via the Looker API, and posts any errors in our #data-qa slack channel so we can quickly fix. Getting that in place has helped catch issues that slip through the cracks when deploying.

Hope that helps! We are definitely figuring out the best process for this ourselves, so I’m very curious to see how others have approached this problem.

3 Likes

@arvindr – I would love to see that script! You should totally consider open sourcing it :slight_smile:

@JoeNaso – re: your original question:

First of all, I have been in this situation more times than I wish to admit. The first thing that I recommend is creating a culture that helps you debug “broken” or “wrong” dashboards quickly, to help you understand what exactly has changed! We have a guide on reporting bugs as well as an issue template that we share with any BI users that are reporting issues.

^That’s obviously a reactive strategy, and long-term it’s better to be proactive, so my $0.02AUD on this:

I think there’s two types of changes that can impact a dashboard:

  1. The data model that the dashboard selects from has changed
  2. The dashboard structure has changed

These are pretty distinct problems, so I’ll address them separately.

1. The data model that the dashboard selects from has changed
From a process perspective, here are some suggestions:

  • Implement a PR template that prompts dbt users to indicate if any existing models are changing – you can see our standard template here. This small change can have big impacts.
  • Check the “before and after” of any major dashboards – no matter which BI tool we are using, we find a way to parameterize the schema we select from, so that we can easily do this.
  • Write some audit macros to validate changes! I have some macros that compare columns across two relations, and often use a version of the equality macro to do record-by-record validation. Pro-tip: write the audit query before you make your changes; you’ll thank yourself at the end :relieved:. Also check out dbt-helper’s compare command – I’m pretty sure you can use this to detect model name changes, and it can probably be extended to also compare column changes (I’m sure @michael-kaminsky would accept a PR!).
  • If your data model changes affect an end user, they should be involved in the PR process as a reviewer!
  • If your BI tool has a git integration, every dbt PR should have a complementary PR on the repo containing your BI code (or you could even have them both in the same repo! I haven’t done this, but it’s not a terrible idea!)

2. The dashboard structure has changed
i.e. the fields the dashboard selects, the visualizations it uses, or the filters it applies.

The answer to this one ultimately depends on the BI tool you’re using, and I think the above responses do a really good job at covering this, so I won’t repeat what they’ve said.

One point I’d like to make here is that sometimes dashboards change/mistakes creep in because the data model an analyst is querying isn’t intuitive/well documented. Strong data model design, and good documentation can go a long way on this front, which of course, dbt helps a lot with.

A few extra points if you do use Looker:

  • Looker offers LookML dashboards that are written in code, so can also be version controlled. I’ve seen some orgs have “Source of Truth” dashboards in LookML, and all non-Source of Truth dashboards as user-defined dashboards
  • Alternatively, you can use Looker’s permissions to ensure that only authorized users are allowed to edit a user-defined dashboard.
  • I find that Looker can abstract away the git workflow too much – as an analyst whose first introduction to git was via Looker, I definitely learned bad git habits. I think it’s worth spending the time to go through basic git workflows with your analysts (especially if you want to include them in the PR process, as outlined above), and (as you might expect) we’ve open sourced our git guide here.
2 Likes