Introducing the nfl-dbt repo

Last week I wrote a post about modeling NFL Fourth Down Attempts in Python. This was mostly for learning/teaching purposes, less so for hard-hitting sports analysis. I really liked the dataset I found for the post, available in the nflscrapR-data repo on Github. However, the data there is spread across multiple files by season, so I thought it would make most sense to load them to a database and do some light modeling in dbt, which is now on Github:

It makes for a nice teaching and testing dataset:

  • it’s fairly clean, but has a couple of data quality issues that call for tests, especially around dupes
  • at about 600K plays over 11 seasons, it’s sizable but manageable on a local Postgres instance
  • it could lend itself to be modeled as a Star Schema (I haven’t done that yet)
  • unlike other “toy” datasets it can be updated weekly (with caveats, see the README)

This is awesome!! Thanks so much for open sourcing this - it’s a really cool dataset.


1 Like