I would like to propose a set of naming conventions for dbt macros, inspired by the syntax and functionality of the dplyr package in R, known for its intuitive data manipulation verbs. These conventions are intended as guidelines to help structure macro names in a way that enhances readability and maintainability:
is_
: For macros yielding boolean outputs.filter_
: Reflecting dplyr’sfilter()
function, for macros that apply conditions to narrow down data sets.mutate_
: Borrowed from dplyr’smutate()
, for macros that modify or create new columns.calculate_
: Suggesting a focus on computation, akin to dplyr’s various mathematical and summary functions.summarize_
: Mirroring dplyr’ssummarize()
, for macros that aggregate data, often used in conjunction with grouping.compat_
: An addition to facilitate cross-database compatibility, ensuring base sql functions can be cross compiled SQL dialects, these are to enable easier migration of dbt repos to new underlying data platforms
These suggested naming practices draw from the successful design principles of dplyr, aiming to bring a similar level of clarity and efficiency to dbt macro organization.
While these suggestions are inspired by dplyr, they are not meant to be prescriptive. The dbt community encompasses a wide range of programming backgrounds and preferences, and as such, alternative naming conventions or existing practices may be equally valid or more suitable for specific projects or teams. These guidelines are intended to serve as a foundation or source of inspiration, which teams can adapt based on their unique requirements and experiences.
The choice to draw inspiration from dplyr is rooted in its widespread appreciation for making data manipulation tasks more intuitive and readable, qualities that are highly beneficial in the context of dbt projects as well. The proposed compat_
convention specifically addresses the additional layer of complexity introduced by working across different SQL dialects, offering a structured approach to ensure macro portability and adaptability.
Adopting these naming conventions can benefit dbt developers and teams by providing a coherent framework that fosters code readability and collaborative development. Particularly, those familiar with dplyr and R might find these conventions to be a natural extension of their existing data manipulation practices, facilitating a smoother transition to and within dbt projects.
I am interested in contributing these guidelines to the dbt community. Having applied these principles in my own work with positive results, I am keen to share these insights and collaborate with others to refine and integrate them into broader dbt practices. Feedback and contributions from the community would be invaluable in ensuring these conventions resonate with and are useful to a wide range of dbt users.