SQL Readability: A Consideration

The best way to encourage readability is through standardization.
When people know the structure of what they’re reading, they can spend less time figuring out what’s going on a more time focused on the content. That’s where SQL Style Guides come in.

We’ve talked about this before when @mhmazur shared his SQL Style Guide in this discourse post: Sharing my SQL Style Guide

@claire recently explained some of the reasoning in Fishtown’s style guide around CTEs: Why the Fishtown SQL style guide uses so many CTEs

Some of the other’s frequently mentioned in dbt slack:

For a project’s readability, it’s less about any particular query and more about consistency across queries.

It’s hard to be definitive across organizations because every organization has a slightly different context. For example, your example above takes up a lot of space on a 13 inch Macbook. From the discourse view, I cannot see the whole thing on a single screen:

Even when I move it to vim, this pretty straight-forward query takes up most of the screen:

A company that standard issues external monitors to team members can make more spacing demands without considering the negative effect to productivity. Your recommendations should be considerate of who is working on the project and what their environment might look like.

My personal reaction (not GitLab’s) is that your outlined suggestion is wasting a lot of valuable screen space. I would rewrite it as follows:

WITH myspace_events AS (

    SELECT *
    FROM myspace_events

), myspace_users AS (

    SELECT min(created_date) as earliest_musician_date
    FROM myspace_users
    WHERE profile = 'Musician'

), final AS (

    SELECT event_id, -- I would have renamed this ID in a base model
        created_date,
        myspace_events.login_id, 
        geo_state,
        first_name,
        last_name,
        SUM(login) AS total_login -- you reference a login.login but don't alias a table called login
    FROM myspace_events
    LEFT OUTER JOIN myspace_users
    ON myspace_users.user_id = myspace_events.user_id -- I would have renamed user_id in the base model
    WHERE login_date >= (CURRENT_DATE - earliest_musician_date)
    GROUP BY 1, 2, 3, 4, 5, 6

)

SELECT * 
FROM final

This reduces the SQL from 38 to 29 lines.
Notable differences include:

  • removing comments, as there are better places to put them; the comments I include here are for reference and I would not include them in an actual query
  • reducing blank lines
  • helping visualize order of operations using parentheses where there are multiple actions in a single line
  • avoiding ambiguous column names, like id by renaming them in base models
  • not aliasing model names when adding 8 characters means you don’t have to
  • using CTEs to call source data

You can see how much less screenspace this occupies in a text editor:

But opinions on SQL style guides is a bit like opinions on everything else… everyone has one.

4 Likes