Attribute - Convention Nomenclature

Hi community!

We are changing our data stack and considering some other improvements, and one of them is regarding the convention nomenclature.
I’d like to know from you how do you do the convention nomenclature for attributes! Could you share your inputs regarding it, pls?

Here is what we are considering about it:

Thanks!!

Hey @jmilhomem - it’s great that you’re thinking hard about these conventions! In software, we refer to this type of naming scheme as hungarian notation :slight_smile:

I don’t use regimented conventions like this when building out my data models - I instead like to use more contextual prefixes/suffixes, though it’s certainly not super well-defined on paper anywhere. I do something like:

- is_<boolean>
- <timestamp>_at
- first_<timestamp>_at
- last_<timestamp>_at
- count_<number>
- <entiy>_id

So, not too dissimilar from your approach I think!

The thing that jumps out at me about your proposed conventions is that it mixes types and “kinds”. Whereas nm_ and dt_ denote a database type, sk_ and ds_ denote a kind of string. I think that’s totally fine if it works for you, but I just wanted to point it out :slight_smile:

2 Likes

I like is_* and has_* for booleans, and *_at for timestamps. Beyond that we are pretty inconsistent, mostly because my thoughts keep evolving. At the moment I’m inclined to agree with <entity>_id though we have an awful lot of primary keys just named id and our analysts are used to that. What I don’t like about it is that I’ve had several cases of junior analysts just joining all tables by id instead of understanding the foreign keys, but that is mostly a matter of training and unfamiliarity with sql. Still, explicit is better than implicit.

For counts I like *_count, feels more English language intuitive to me. For money I use an explicit currency_code column if one exists, and suffix with *_usd if it doesn’t.

When there is a compound key I create a surrogate by concatenation and then almost always just name it row_key to indicate it shouldn’t join to anything.

My overriding concerns are understandability of the data model, and analyst ergonomics, so I shy away from making final tables feel “programmer-y” with prefix and suffix notation like in your example.

1 Like

Quick update: I said

I instead like to use more contextual prefixes/suffixes, though it’s certainly not super well-defined on paper anywhere.

It turns out that @claire defined this on paper somewhere! These are just our internal conventions at Fishtown Analytics, but feel free to borrow from them as you see fit :slight_smile:

2 Likes