Sharing my SQL Style Guide

Hey all - I put together a style guide outlining the preferences I’ve adopted over the years for how to format SQL queries. I shared it in dbt Slack and Claire recommended I share here as well.

You can check it out on GitHub here: Mazur’s SQL Style Guide.

Feedback welcome on any of it, especially parts where you’ve adopted different conventions. Cheers!

4 Likes

Very nice style guide. Since you asked for feedback, I’ll share mine.

I prefer “<>” over “!=” for not equals since the former is ANSI SQL-92 and latter is not. It is also very easy to type.

I “group by numbers” all the time. I find it very convenient and I have not heard a good reason (yet) as to why I should not do this. :slight_smile: I always follow “Grouping columns should go first” which I think makes this approach ok.

What is the reasoning for " Commas should be at the the end of lines"? I think there are pros/cons for either approach on this one; I used to put commas at the beginning, but lately have been putting them at the end.

For anything noted as “bad”, it may be good to explain why. Some were obvious to me, some were not.

Thanks for sharing! Nice work.

Hey @davemasino, thanks so much for this thoughtful feedback.

Regarding <> vs !=, I didn’t realize the former was a ANSI SQL-92 standard. TIL. I probably won’t update change the guide it simply because I don’t think it really matters one way or the other, and I just prefer <> is all.

I hear you on grouping by numbers too. I go back and forth on it, as the commit history of the guide would show for anyone who ever checks it out :). Might wind up going back to “numbers are fine” in time, we shall see.

Regarding commas at the beginning, it just looks bad to me. Obviously that’s just a personal preference though. The main argument for having them at the beginning is that it makes diffs cleaner, but that’s doesn’t outweigh the ugliness of it IMO.

Copy on elaborating on the “bad” conventions when I have more than just a personal preference for it.

If you think of anything else that would be worth adding or changing I’d love to chat about it more.

Cheers!

Matt

1 Like

You are welcome, @mhmazur ! Will do for sure.

Dave

In defense of leading commas:

in general, the most important columns come first. They’re the ones least likely to change. The last column is generally much more likely to be modified than the first!

Do we care?

Well, if code is write-once read-many, then no. But if code is write-many read-many, then at a small aesthetic cost, it becomes possible to simplify git diffs. The vast majority of changes to a set of expressions will not affect the first column.

So your diff when adding or removing a column at the end of a sql file will be exactly one line - it will reflect the action that actually occurred (a line was added, a line was removed) as opposed to two or more lines, which imply that a line was changed, when in reality a comma was added and then a line was added – the real change was the new column, or the removed column.

It’s a small thing, but for me, keeping diffs as clean as possible trumps the aesthetic value of trailing commas.

Great response. However, I would beg to differ and say the aesthetic value of trialing commas only applies to past-scenarios where Text Editors didn’t offer a way to quickly resolve trailing commas. (e.g. VS Code and another editor offer multiline edits or some other macro/shortcut to make it easier for trailing commas to be edited).

But in regards to your aesthetic portion, I would say that trailing commas are great because they visually end the feature/column call, and logically proceed into the next call or line.

However, I know it’s a touchy topic–I once had this conversation once with someone–it got interesting haha

1 Like

I love your style guide (maybe because it’s very close to how I write SQL haha)! Do you have thoughts on whether it is ever appropriate to split out function calls with several arguments over multiple lines?

I find myself doing this every now and then, especially with date/timestamp manipulation functions, or if I want to aggregate the result of a case...when. Perhaps your “avoid function-wrapped arguments” requirement could be interpreted as arguing that I should separate “creating” my arguments (using a case...when in this case) from the function call?

No worries, I hear you there. I have a hate/love/hate affair with feature rich IDEs, and so my choice to use only basic autocomplete (and maybe some linting in python…maybe) strongly colors my opinion. For my development style I am definitely less likely to make errors if I use leading commas, but I understand that the vast majority of people these days prefer feature rich tools.

super-clean git diffs are still my main defense of leading commas, though. no tool will get you around that =D

Here’s a fun article around the leading/trailing comma debate that uses a ton of Github data to try and determine some measure of “success” of projects that adhere to leading/trailing/both. It’s a little dated - I wonder if we re-ran the queries in there if those trends still hold?

Awesome guide - mostly because it is close to what many data analyst unconsciously end up doing without following formal rules. A nice balance between readability, speed, consistency.